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Preface 


This book presents both the traditional and modern aspects and applications of artificial 
intelligence and soft computing in a clear and highly comprehensive style. It provides 
an in-depth analysis of mathematical models, algorithms, and demonstrations of real-life 
complex problems in MATLAB. This book contains 15 chapters altogether. This book con- 
tains six case studies on industries such as liquid flow process control, machinery process 
industries, chemical industry, biomedical systems, manufacturing processes, and renew- 
able energy. We have organized this book in such a manner so that the main objective 
is the realization of intelligent systems using methodologies. This book is unique for its 
contents, clarity, precision of presentation, and the overall completeness of its chapters. 
Understanding logic requires a basic understanding of data structures and C. All the simu- 
lation results are obtained from the MATLAB code and SIMULINK. So, this book is useful 
both for people who are interested in engineering and for research scholars because of 
what it says and how it says it. This book has a lot of computer simulations that show how 
different intelligent control techniques were used in real-world case studies. The book is 
organized into 15 chapters as follows. 

Chapter 1 discusses the overview of intelligent systems and soft computing. This dem- 
onstrates the scope of their application in real-world problems with complex systems. lt 
highlights the historical development and how it was polarized in different domains. This 
chapter addresses different software tools that enable decision makers to draw on the 
knowledge and decision processes of experts in making decisions. Many public account- 
ing firms have put in place some kind of intelligent system to help auditors and make their 
work more efficient. The most commonly used intelligent systems in public accounting are 
neural networks, genetic algorithms, and fuzzy logic. Research has led to neural networks 
that can learn and process datasets roughly, as well as the genetic algorithm for systematic 
random search and the fuzzy logic controller for roughly estimating reasoning. This chap- 
ter also talks briefly about how the book is set up in light of these tools and methods. 

Chapter 2 focuses on the practical application of fuzzy logic. Humans and machines 
differ in that human reasoning is ambiguous, imprecise, and fuzzy, whereas machines and 
the computers that power them are predicated on binary logic. Fuzzy logic is a technique 
for increasing the intelligence of machines by giving them the ability to think in a fuzzy 
style, similar to how humans do. A fuzzy controller is a knowledge-based controller in 
which fuzzy logic is used to represent knowledge and logical inference. In the present 
research, the implementation of the fuzzy set and its application in the field of fuzzy logic 
controllers is described. Beyond these types of membership functions, fuzzification and 
defuzzification techniques are also described. 

Chapter 3 presents a practical approach to neural network models. Biological systems 
are often compared to intelligent systems when examining how humans carry out control 
functions or make decisions. Many academics are getting more interested in deep learning 
because intelligent systems are becoming more popular and can adapt to new situations. 
Most of the time, neural networks are used to build systems with multiple stages that test 
learning algorithms by controlling their weights and biases. This chapter talks about the 
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basics of neural networks. It explains the main parts of the network and how they work 
together. With the use of examples, it also covers several kinds of learning algorithms and 
activation functions. In common implementations, these principles are explained in depth. 

Chapter 4 introduces a novel type of classical algorithm for intelligent search and opti- 
mization problems that mimics the biological evolutionary process. After a brief introduc- 
tion to the components of objective functions, objectives, and how to solve any objective 
function using optimization, different categories of optimization techniques with suitable 
examples are followed by different steps of genetic algorithms and their application in dif- 
ferent domains, depending on the characteristics of the constraints. The last part of the 
chapter talks about differential evolution, which is one of the most well-known combina- 
tional optimizations. 

Chapter 5 explains in detail the theoretical background of the adaptive neuro-fuzzy 
inference system (ANFIS) model. Depending upon the learning algorithm, the ANFIS 
model classification and characteristics are also explained graphically. Finally, overall steps 
are highlighted to estimate any train and test dataset. 

Chapter 6 explores four different types of machine learning approaches, including 
supervised learning, unsupervised learning, reinforcement learning, and inductive logic 
programming. Decision tree learning and variation space-based learning are heavily 
emphasized within the supervised class of machine learning. A sample classification issue 
is used to briefly introduce the unsupervised class of learning. Q-learning and temporal 
difference learning are examples of the reinforcement learning discussed in this chapter. 
The basic idea and motivation of inductive logic programming are illustrated in the last 
section. 

Chapter 7 analyzes three different bio inspired optimization techniques. Particle swarm 
optimization (PSO), flower pollination algorithm (FPA), and cuckoo search optimization 
(CSO) are used efficiently for estimating the drain and transfer characteristics of two model 
parameters of JFET (Model No: J112A N-channel JFET). In addition, we can successfully 
forecast the JFET's I-V characteristics, which reinforces the approach that has already been 
established. To perform the comparative study among these three algorithms, three criteria 
were chosen, namely RMSE, computational time, and convergence speed. According to the 
results, FPA outperformed the other two algorithms in terms of computational time, con- 
vergence speed, and RMSE by about 3.41% and 2.19% for drain and transfer characteris- 
tics, respectively. 

Chapter 8 discusses a neural network approach to determine the impact of epidemio- 
logical parameters that affect risk factors. Five elements in all, which are categorized as risk 
factors, have been taken into consideration. This neural network model supports under- 
standing and evaluating the impact of these factors on the spread of COVID-19. Also, the 
model establishes a basis for understanding the effect of risk factors and vice versa. In this 
chapter, a total of 162 datasets are used, which contain the input parameters virulence, 
immunity, temperature, and populations; the output is risk value. The model response was 
cross-verified with the actual risk value and found to be satisfactory. In the second phase 
of work, the optimized conditions of epidemiological factors were found to give the fittest 
model of risk factor of the test dataset. 

Chapter 9 discusses the manufacturing of microscopic parts using the electrochemical 
discharge micro-machining technique (u-ECDM). During micro-channel cutting on silica 
glass (SiO; NaSiO,), parametric effects on the material removal rate (MRR), machining 
depth (MD), and overcut (OC) have been proposed utilizing the applied voltage (V), pulse- 
on time (s), stand-off distance (SD), and mixed electrolyte concentration (wt%). The effec- 
tiveness of the model has been evaluated using a fuzzy logic based Al tool, and membership 
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based on the magnitude of the input parameters resulting in the highest cutting depth with 
the most material removal at the shortest overcut has been determined using optimization. 
Results analysis further reveals that employing a mixed electrolyte on a cylindrical tool 
and computer-assisted subsystem movement for the x, y, and z axes enhanced machining 
depth and surface quality. 

Chapter 10 focuses on two different ANFIS models: grid partition and subclustering 
algorithm, which are used for fast and best prediction of average localization error (ALE). 
Transmission range (TR), node density (ND), and anchor ratio (AR) and iterations (IT) are 
considered as features in a dataset for prediction of ALE. In experimental result it is seen 
that both the ANFIS models produce the same RMSE error of 0.069, but the correlation 
coefficient of grid partition outperforms the later ANFIS model. 

Chapter 11 discusses three optimization techniques: GA, PSO, and a hybrid algorithm 
of GA and PSO (HGAPSO). For a single diode model and a double diode model of a solar 
cell, the statistical result is correlated with the particle swarm optimization (PSO) and 
genetic algorithm (GA). A comparative study reveals that the upgraded rendition of the 
gray wolf optimization tool provides a more accurate model for predicting the ideal solar 
cell parameters with the fewest possible iterations. Hence we recommended HGAPSO as 
the best optimization tool for providing the perfect performance. 

Chapter 12 designs an evolutionary method for optimization problems of nonlinear 
systems. Because it is used in the biological field, a cylindrical tank is used here as a non- 
linear example. Its inversion is caused by the way the method proceeds, making it feasible 
to expel the products without any squandering. The typical PID controllers' closed loop 
capabilities are assessed in order to govern the tank's level. Flower pollination algorithm 
(FPA) and bacterial foraging optimization (BFO), two naturally based optimization tech- 
niques, are deployed to enhance performance, as given in two segments overall: in the first 
part, the model is represented into two transfer functions, the ordinary first order function 
and first order with time delay; and in the second part, proper tuning of PID controller 
parameters of both the transfer functions is determine by BFO and FPA algorithms. 
Simulation results show that the flower pollination algorithm is better than BFO in terms 
of transient behavior, and that the proposed nature-based optimization techniques are bet- 
ter than the proportional integral controller. 

Chapter 13 talks about the process variables of the liquid flow system, which are often 
changed while the equipment is still running. Therefore, it appears that choosing the right 
level of organizational factors, or interaction effects, is key to gaining the best flow rate. 
The determination of the ideal linear combination quantities in the liquid flow rate mecha- 
nism is the main emphasis of the current study. Input parameters include liquid conductiv- 
ity, pipe diameter, flow sensor output, and viscosity, while response parameters include 
flow rate determined by testing. The process was initially analyzed quantitatively using 
ANOVA. In the next step, a number of newly suggested metaheuristics, such as PSO, CSO, 
and hybrid CSPSO, are used to optimize the liquid process involved parametrically in 
order to maximize the responsiveness in parameters. The suggested approach was cor- 
roborated by the simulated outcomes, which also validated the test. 

Chapter 14 discusses a synthetic minority over-sampling technique (SMOTE) based 
deep neural network for early prediction of diabetes. We used 768 datasets with 17 attri- 
butes to do this research. There are 268 false, true, and positive samples in these datasets, 
and 500 positive samples. In the first stage, SMOTE is used to get rid of randomness, imbal- 
ance, and make the oversampled dataset more accurate and stable. In the second stage, 
balanced datasets are applied in a deep neural network regression model to train and test 
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the datasets. In the last step, we used three different ways to measure how well our model 
worked: confusion matrix parameters, statistical parameters, and computational time. 

Chapter 15 explains how machine learning algorithms are used to collect evaluations 
from the Internet and classify them into six categories for the prediction of human behav- 
ior: walking, going upstairs, walking downstairs, sitting, standing, and laying. This chap- 
ter identifies the participants' behavior patterns and attempts to gain more insights. Seven 
different machine learning classifiers are employed in this work to determine the accuracy, 
exactness, recollection, and F1-score to identify the best model. Throughout the analysis, it 
is seen that the linear support vector machine (LSVM) shows average accuracy of about 
9795, far better than the other methods. 

We are sincerely thankful to the Almighty for supporting and standing with us at all 
times, whether it's good or tough times, and giving us ways to concede. Starting from the 
writing of the chapters till the finalization of the chapters, all the editors gave their contri- 
butions amicably, which is a positive sign of significant teamwork. The editors are sin- 
cerely thankful to all the members of CRC Press, especially Aastha Sharma and Isha Singh, 
for providing constructive input and allowing an opportunity to write this important book. 
We are equally thankful to the reviewers who hail from different places in and around the 
globe who shared their support and stood firm toward quality chapter preparation. 
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Introduction to Artificial Intelligence 


1.1 Introduction 


With the implementation of framework complexity and modernization of process plants, a 
new approach has been taken which is dependent on evaluation, generation, planning, and 
conveyance time. Modern process plants aim for expanded efficiency, better item quality, 
and developing benefits to stay focused on the global economy. Automation is urgent for 
prudent plant activity through productive strategies, proper energy utilization, improved 
safety, and waste minimization. Automation-based process industries must meet criteria 
that include increased product quality, reduced factor lead times, increased productivity, 
improved security, and finally decreased undesirable signs. 

Control framework configuration is incredibly dominant by the measure of nonlinearity 
present inside the process. If a linear model is somehow influenced by the tiny effects of 
nonlinearity in the system, then to provide satisfactory output over the wide range of the 
linear process, we use some classical controllers. However, the presence of a significant 
amount of nonlinearity or disturbance makes the linear models ineffectual even away from 
the working point. To compensate for such nonlinearity and disturbance, a technique 
known as adaptive control strategies is used. To compensate the corresponding variations 
in the properties of the process, the adaptive control framework uses feed forward, feed- 
back, or a combination of the two. An adaptive control technique requires adjustable con- 
troller parameters and a mechanism like gain scheduling of a linear controller in a minor 
nonlinearity-affected process plant. But for several nonlinearities affecting the working 
condition and fluctuating parameter variations in a process plant, we need to design a 
dynamic model to overcome such conditions. 

The exceptionally nonlinear behavior and time-differing parameters of liquid flow 
process systems make these a benchmark for display and control of nonlinear processes. 
Nonlinear processes can be demonstrated utilizing three different ways in particular: 
mathematical demonstration, dependence on the first principles approach, and frame- 
work separation dependent on experimental input-output information [1, 2]. Dynamic 
modeling, also called “white box" modeling, utilizes laws of conservation of mass, physical 
realization, and chemical laws. Its performance is always dependent on the first guideline. 
White box models [3, 4] are physical models depending on thermodynamics and addi- 
tional scientific conditions and design techniques for vital display, analysis, and control. 
White box model strategies are fuzzy logic controllers where dynamic models explained 
with optimistic assumptions—for example, immaculate blending and nonattendance of 
estimation clamor—do not delineate the genuine and sensible conduct of the process. 
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Nonlinear adaptive control frameworks (white boxes) are incapable of effectively 
measuring the process parameters of modern process plants. 

To design a black box model doesn't require any knowledge of framework design. The 
model can be implemented with the help of prior information of input and output, which 
is why it is also called a statistical based model. To prepare the black box model requires 
an enormous number of input-output datasets. Since 1990, artificial neural networks 
(ANNs), genetic algorithms (GAs), support vector machines, reinforcement adaptation, 
deep machine learning, and so on have been the most well-known black box modeling 
techniques applied in a wide scope of nonlinear framework applications. The ANN-based 
model is inspired by organic neural systems, and it contains a lot of interconnected 
nonlinear handling components known as counterfeit neurons. The GA-based model is 
motivated by bio-enlivened operators such as mutation, crossover, and selection. ANN 
and GA have an incredible capacity to learn nonlinear elements of mind-boggling 
processes due to their intrinsically parallel and conveyed design. That is the reason ANN- 
and GA-based model procedures have been broadly uncovered for the process control 
industry. 

Framework-recognizable dynamic "gray box" models can be structured utilizing 
exploratory information obtained from the process. The "gray box" method needs some 
understanding of the framework separated from the test information. To overcome the 
drawbacks of white and black box models, hybrid models have been presented [5]. Gray 
box (hybrid) models are a blend of material science-based and statistical-based models. 
Gray box modeling is incredibly exact for process control framework and performance 
improvement. 


1.2 Intelligent Control 


For different modern industrial activities, and even household appliances, the connections 
between imprecision, vigor, and organization of these strategies have turned into the inex- 
orably rise of "intelligent systems” [6-8]. 

Since the 1970s, scientists have proposed many control systems which incorporate 
different stages, such as modeling, analysis, simulation, implementation, and verification 
[9, 10]. Most of these control strategies have found their way into practice but have not 
received significant attention. In 1977, Fu and Saridis [11] first introduced the concept of 
intelligent control. In early 1962, Zadeh [12] articulated the intelligent control theorem. 

There is no formal or single definition of an intelligent control framework. An 
intelligent control framework ought to fulfil the famous Turing test, which can be briefly 
expressed as follows: a similar task is done by a human and a machine (or a program); if 
at any instant one can't recognize the machine and the human by looking at just their 
ideas, then this machine is said to be intelligent, and otherwise not. Nonetheless, 
intelligent control frameworks can be comprehensively portrayed as the utilization of 
human-made reasoning strategies to the structure and execution of automated control 
frameworks. Dr. Fu first presented the term "intelligent control" and started pondering 
them in the 1950s. At the underlying phase of the advancement of intelligent control, 
there were tight ties between human brainpower and programmed control. Advances in 
software engineering and operation research also added to intelligent control during the 
years that followed. 
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1.3 Expert Systems 


An expert system (ES) is computer software created to solve complicated issues and offer 
decision-making capabilities similar to those of a human expert [13, 14]. This is accom- 
plished by the system retrieving information from its knowledge base in accordance with 
user queries, utilizing reasoning and inference procedures. 

The first ES, which was the first effective use of artificial intelligence (AI), was estab- 
lished in 1970 and is a subset of AI. By drawing on the knowledge that is kept in its knowl- 
edge base, it can solve even the most complicated problems like an expert. Like a human 
expert, the system aids in decision-making for difficult issues by using both facts and 
heuristics. 

A conventional problem-solving system has both programmers and data structures 
encoded, whereas an expert system just has data structures hard-coded and does not have 
any problem-specific information encoded in the program. A user interface and an infer- 
ence engine are included in the knowledge engineer software used to build the majority of 
expert systems [15, 16]. Building a knowledge base is the main obstacle in the creation of 
expert systems, and this gap is encouraged. Not all expert systems contain learning com- 
ponents that allow them to adjust to new circumstances or satisfy new demands. But each 
expert system shares the feature that, after being fully constructed, it will be evaluated and 
proved using the same real-world problem-solving scenario. These systems are created for 
a specific industry, like science, medicine, and so on. 


1.4 Soft Computing Techniques 


In a wide point of view, intelligent control frameworks underlie what is designated "soft 
computing." Intelligent systems are an integration of methodologies which provide the 
foundation of conceptual design. They integrate the trade-off of precision and certainty of 
traditional hard computing systems and the computation, reasoning, and decision-making 
of soft computing systems, as shown in Figure 1.1. Moreover, the major part of the basic 
intelligent system is reciprocal instead of focused [17, 18]. Progressively, these methodolo- 
gies are additionally applied as a combination, alluded to as "hybrid." 

To improve AI, soft computing archetypes and their combinations are used, but in com- 
puting processes AI also incorporates human expert knowledge. Their applications include 
but are not limited to controlling complex systems, predicting the unknown parameters of 
geological changes or the world economy, controlling industrial processes, and so on. 
Figure 1.2 represents the basic structure of an intelligent control system. 

The main advantages of soft computing methodologies compared to the analytical 
methods are that they can learn from the framework, they are capable of mapping output 
from input information, and their search space is global instead of local. 

The improvement of soft computing techniques has encouraged significant research 
enthusiasm since 2010, which is definitely not a solitary philosophy; rather, it is a combina- 
tion of a few approaches, namely neural systems, fuzzy logic controllers, and genetic algo- 
rithms. Unlike conventional (hard) computing, soft computing always exploits instinct. 
The inspiration for applying the human instinct is that an enormous number of genuine 
issues cannot be understood by conventional computing techniques because of the way 
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FIGURE 1.1 
Basic diagram of a soft computing system. 
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Basic structure of an intelligent control system. 
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TABLE 1.1 
List of Basic Intelligent Systems with Their Advantages 


Name of the Basic Intelligent System Advantage 

Neural networks Learning and approximation 
Genetic algorithms Systematic random search 
Fuzzy logic Approximate reasoning 


that possibly they are excessively mind-boggling to grasp or cannot be portrayed or classi- 
fied by explanatory and definite models. The fuzzy set hypothesis depicts deficient ideas 
that are hard to figure out numerically. Meticulousness of decision and participation capac- 
ities for a given problem is the major issue for a fuzzy logic controller. Neural networks, a 
face of soft computing, frequently experience the ill effects of a moderate learning rate. 
This disadvantage renders neural systems not exactly appropriate for time-basic applica- 
tions. The genetic algorithm is altogether raised on probabilistic as opposed to determinis- 
tic hunts. Table 1.1 shows the list of basic intelligent systems and their advantages. 

Nonetheless, in soft computing, systems are accessorial as opposed to focused. More 
precisely, it is worthwhile to utilize the fuzzy logic control, neural systems, and genetic 
algorithm in a mix rather than straightforwardly. A fuzzy neural system is a hybrid intel- 
ligent tool incorporating frameworks of both fuzzy inference and neural systems, with the 
goal that their individual significances survive. Fuzzy-neural systems have a similar topol- 
ogy to feed-forward neural systems, through which they capture the inexact deduction 
attributes and loose data preparation capacity of fuzzy logic systems; then, they addition- 
ally have the quality of adjustment and speculation by taking in the calculations from 
neural systems. The genetic-neural strategy removes the extreme weakness of applying an 
unadulterated back-propagation to train the neural systems. 

Soft computing methodologies mimic consciousness and cognition in important ways 
that differ from analytical approaches. They can learn from experience, universalize into 
domains where direct experience is lacking, and perform input-to-output mapping more 
quickly than inherently serial analytical representations offered by parallel computer 
architectures that simulate biological processes. The anticipated reduction in computa- 
tional burden and subsequent rise in calculation rates that enable more robust control are 
the driving forces for such an extension. There is a large amount of literature on soft com- 
puting, both theoretical and practical. Section 1.4.1 introduces the concept of fuzzy logic as 
well as its applicability to various industrial processes. In Section 1.4.2, the justification as 
well as the rationale for the utilization of neural networks in various industrial applica- 
tions is presented. The evolutionary computation is presented in Section 1.4.3. Section 1.4.4 
is devoted to the integration of soft-computing methodologies commonly called hybrid 
systems. Finally, real-time systems are presented in Section 1.4.5. 


1.4.1 Fuzzy Systems 


Since about 1990, there has been much disagreement and spirited discussion around fuzzy 
logic. Zadeh, who is regarded as the father of the area, wrote the first article in fuzzy set 
theory, which is today regarded as the foundational study on the topic. In that work, Zadeh 
was subtly extending the idea of human approximate reasoning, which enables people 
to make wise choices based on the imperfect language data at their disposal [19-21]. 
Mamdani carried out the first implementation of Zadeh's concept in 1975, proving fuzzy 
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FIGURE 1.3 
Functional diagram of fuzzy system. 


logic control's (FLC's) applicability to a small model steam engine. There are, however, 
knowledge-based systems and information that cannot be described by conventional 
mathematical representations, noted in Section 1.1. Such pertinent subjective informa- 
tion is frequently disregarded by designers in the initial stages but is frequently used to 
assess designs in the final stages. Systems based on information and knowledge can be 
built using fuzzy logic. The so-called knowledge-based technique is considerably more in 
line with how people actually think and speak than is the conventional classical reasoning. 

Fuzzy sets and the fuzzy logic hypothesis to change them into an important standard 
logic are shown in Figure 1.3. The target information exists in a numerical structure that is 
applied in some engineering points, and the transient learning exists in a phonetic struc- 
ture which is for the most part impossible to systematize [22]. Fuzzy logic systems have 
been utilized for modeling and have been replicated in many real-time system problems as 
shown in Figure 1.4. 


1.4.1.1 Architecture of Fuzzy Logic Systems 


Input variables: Crisp values changed into the fuzzification section. 

Fuzzifier: The fuzzifier is transfiguring crisp surveyed facts into suitable etymologi- 
cal qualities. 

Fuzzy inferencing: This is the brain of a FLC, which accomplishes the ideal control 
strategy after utilizing the skill of human knowledge through performing approxi- 
mate consciousness. 

Fuzzy rule base: This sends experimental information to a process domain expert. 

Defuzzifier: This makes a non-fuzzy choice from a contingent control activity by the 
inference system. 

Output variables: Defuzzification section changes output to crisp values. There are a 
number of books related to fuzzy logic [5, 19, 23-25]. 


1.4.2 Neural Networks 


The neural systems attempt to replicate the natural utilities of the human mind. A neural 
system is a data processing model [26, 27] motivated by organic sensory systems. It is 
fundamentally comprised of an immense number of incredibly interconnected handling 
segments that are seeking to solve the specific problem. A neural network is exemplified 
for an exact application with the assistance of a learning technique which incorporates 
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FIGURE 1.4 
Basic framework of fuzzy logic controller. 


changes to the synaptic associations between the neurons. The basic architecture and 
framework for back-propagation of an ANN model are shown in Figure 1.5. 


1.4.2.1 Basic Architecture of Neural Network 


Feed forward networks: In this model, data only proceeds in one direction [28]. Here 
feedback or loops are absent so the output is not affected by any intermediate 
layer. Feed forward neural networks in general are conventional forward systems 
that go to contributions with the output. They are exhaustively utilized in example 
acknowledgment. 
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Calculate the gradient of error | 


FIGURE 1.5 
Framework for back-propagation of an ANN. 


Feedback networks: Feedback systems enable signs to travel in two directions with 
the assistance of loops [29]. This method is extremely powerful and becomes 
exceedingly mind-boggling. Feedback network systems are self-persuaded, and 
their state is dynamic until it arrives at a steadiness point. They are utilized in the 
process control industry, forecast of ecological parameters, and so forth. 


Although single layer feed forward networks, multilayer feed forward networks, and 
recurrent networks are the three major categories for neural network architectures, 
numerous different neural network structures have developed over time. Back- 
propagation networks, perceptrons, adaptive linear elements (ADALINEs), associative 
memory, Boltzmann machines, adaptive resonance theory, self-organizing feature maps, 
and Hopfield networks are a few of the well-known neural network systems. 

To name a few, pattern recognition, image processing, data compression, forecasting, 
and optimization challenges have all been effectively solved using neural networks. 
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1.4.3 Genetic Algorithms 


Genetic algorithms are stochastic inquiry enhancers that depend on the considerations of 
development and characteristic choice. GAs [30, 31] are characteristically parallel calcula- 
tions, which repeat to exploit the present parallel supercomputers to speed up the stream- 
lining task by a factor near the quantity of parallel specialists. GAs perform very well 
in blending multidimensional function areas, matters with discrete arrangement spaces, 
and nondifferentiable objective function. Global optimizers are generally autonomous of 
the solution domain, which is why they do not trouble the referenced imperatives of the 
optimization problems. Genetic algorithm are the most suitable to deal with such issues. 

Because of the previously mentioned favorable circumstances of GAs, scientists have 
recently begun to utilize them for parameter enhancement of the procedure control indus- 
try. The basic framework of a genetic algorithm is shown in Figure 1.6. 

Although binary coding of the problem parameters is used in the majority of GA simula- 
tions, real coding of the parameters has also been proposed and used. A wide range of 
scientific and engineering fields, including function optimization, machine learning, sched- 
uling, and others, have found extensive applications for GAs, which have been theoreti- 
cally and empirically demonstrated to enable robust search in a complicated space [22, 32]. 


Set the GA parameter 
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Finding the fitness value of each 
chromosome 


Yes 
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8B - 
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New population 
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generation 


End | | 


Crossover of parent's chromosome 
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FIGURE 1.6 
Basic framework of genetic algorithm. 
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1.4.4 Adaptive Neuro-Fuzzy Inference System 


An adaptive neuro-fuzzy inference system (ANFIS) [33, 34] is an intelligent system (frame- 
work) which counterfeits a neural system dependent on the Takagi-Sugeno fuzzy infer- 
ence framework, and was first exhibited in the 1990s [35]. In the versatile neuro-fuzzy 
framework, neural systems have viable learning calculations to improve tuning of the 
enrollment capacities and appropriate standards of fuzzy frameworks. 

There are some basic aspects of these AI techniques which require better understanding, 
more specifically: 


* No standards methods exist for transforming human knowledge or experience 
into the rule base and database of a fuzzy inference system. 


* There is a need for effective methods for tuning the membership function (MFs) so 
as to minimize the output error measure or maximize performance index. 


In this perspective, this novel architecture called ANFIS serves as a basis for constructing a 
set of fuzzy if-then rules with an appropriate membership function to generate the stipu- 
lated input-output pairs. 

Fuzzy and neural systems are associated in their favorable circumstances and to recover 
their individual shortcomings. Neural networks present the computational attributes of 
the fuzzy frameworks and acquire the lucidity and understanding of framework portray- 
als. Numerous significant parameters of ANFIS support the framework (system) to per- 
form serious tasks through fuzzy principles: precise adaptation, simplicity of actualization, 
fantastic clarification offices, and excellent speculation calibers. Most of the main applica- 
tions are in system control. The applications appear in fields like data characterization, 
data analysis, decision-making, and defect recognition. The basic framework of ANFIS is 
shown in Figure 1.7. 

ANHIS has been used in numerous time series research areas, including the application 
of ANFIS based on singular spectrum analysis for forecasting chaotic time series, chaotic 
time series prediction using improved ANFIS, fuzzy time series forecasting, developing a 
new method for predicting the trends of oil prices, predicting stock returns, and predicting 
financial volatility. ANFIS is ultimately superior to the other approaches. 


1.4.5 Real-Time Systems 


Real-time systems are frameworks operable to address issues as they arise in constant 
inserted programming frameworks, and issues that separate them from other programming 
frameworks. In a real-time system, the logical outcomes of the computations and the 
physical instant at which these results are produced both affect how the system behaves. 
Real-Time systems are categorized from a variety of angles, including aspects inside and 
outside the computer system. Real-time systems, both hard and soft, are given particular 
attention in this book. 

In soft real-time systems, a missed deadline can result in a sizable loss, whereas in hard 
real-time systems, it is disastrous. Therefore, in these systems, predictability of system 
behavior is of utmost importance. The key highlights of the real-time frameworks are the 
convenient reaction prerequisite. Various programming structure procedures have been 
allotted for the distinguishing proof of parallel exercises and execution through learning of 
the planned framework. However, the vast majority of the methodologies depend on the 
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FIGURE 1.7 
Flowchart of ANFIS models. 


heuristics, in that exhaustive knowledge of the proposed framework, decision of 
programming language, and execution qualities of the extreme engineering assume key 
roles in the deconstruction of the framework into the simultaneous framework. Chemical 
concentration of a reaction chamber and all types of process control systems are typical 
examples of real-time systems. 
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Practical Approach of Fuzzy Logic Controller 


2.1 Introduction 


Due to a degree of ambiguity in the factors that characterize the problem or the circum- 
stances in which it arises, problems in the actual world frequently end up being com- 
plicated [1-3]. Despite being a tried-and-true method for dealing with unpredictability, 
a purposive approach can only be used in circumstances where the features are based on 
stochastic processes or situations where the recurrence of occurrences is solely decided by 
chance [4]. The truth is that there are issues, a sizable class of which have a nonrandom 
process as their defining characteristic of ambiguity. In this case, the uncertainty may be 
brought on by incomplete knowledge of the issue, information that is not entirely trust- 
worthy, linguistic imprecision resulting from the problem’s definition, or contradicting 
information from several sources [5, 6]. In these circumstances, fuzzy set theory shows 
tremendous promise for effectively addressing the problem’s ambiguity. Vagueness is a 
synonym of fuzzy. A good mathematical method for addressing the uncertainty brought 
on by ambiguity is fuzzy set theory [7]. Fuzziness frequently appears while identifying 
handwritten letters or comprehending spoken language [8-12]. The many characteristics 
of the classical set and operation are described in this chapter. 


a oo ooo oo reo 


2.2 Classical Set Properties and Operation 
2.2.1 Classical Set 


1. Crisp bounds, meaning there is no question in the prescription or placement of the 
set's limits, are what constitute a classical set [13, 14]. 


2. The universe of discourse is the expanse of all knowledge that is obtainable for a 
specific issue. X is typically used to represent the universe of discourse, while x is 
used to represent each particle in the world. 

3. The total number of components in the cosmos X is called its cardinal number, 
which is denoted by n. 

4. Collections of all the components inside a set are referred to as subsets, and sets are 
groups of the pieces within a set. The term “whole set” also refers to the collection 
of all components in the cosmos. 

5. Null set (q) is the set having no elements in the entire set (X). The power set P(X) is 
a special set that includes all conceivable sets of X. The cordially of the power set, 
denoted by ,(x), is equal to 2", 
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Example: Consider a universe of discourse X = (a, b, c}. Calculate the cardinal number of X, 
power set, and cardinality of the power set. 


Solution: cardinal number = number of elements in the universe = n, =3 
Power set, P(x) = {{g}, {a}, {b}, {c}, la, b}, la, ch, tb, c}, la, b, ch 
Cardinality of power set = n, (x) = 2/9 = 2228 


2.3 Properties of Crisp Sets 


The qualities play a significant part in all mathematical processes since they may be used 
to determine how to resolve them [15-17]. The following are crucial characteristics of tra- 
ditional sets: 


1. Commutivity ANB=BnA 
AUB=BUA 

2. Associativity: AU(BUC)=(AUB)UC 
An(BnC)-(AnB)nC 

3. Distributivity: AU(BnC)-(AuB)n(AuC) 
An(BuC)-z(AnB)u(AnC) 

4. Idempotency: AUA=A 
ANA=p 

5. Identity: AUQG-A&AnX-A 
Anp=AKAUX=X 

6. Transitivity: HA<B<CthenA<C 


4H Lm 


In this case the symbol “< "means contained in or equivalent to and "«" means 
contained in. 


7. Involution: A-A 


De Morgan's law and the excluded middle laws are the other two significant 
exceptional qualities. 


8. Law of excluded middle: It represents the union of a set A and its complement 
AUA-X 
9. Law of contradiction: It represents the intersection of a set A and its 
complement. 
ANA=0Q 
a! 


2.4 Concept of Fuzziness 


To explain fuzziness we take an example to make easier to understand the basic difference 
between Boolean logic and fuzzy logic [18]. 
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2.4.1 Fuzzy Set 


1. In a classical set, a constituent in the cosmos can abruptly and clearly go from 
being a member of a particular set to not being a member of that set. However, 
with a fuzzy set, this shift is slow since the fuzzy set borders are ill-defined and 
unclear [19]. 


2. A set called a fuzzy set has components with variable degrees of membership. A 
fuzzy setis a contained element that includes members of a set to variable degrees. 


3. A set called a fuzzy set has components with variable degrees of membership. 
A fuzzy set is a contained element that includes members of a set to variable 
degrees. Fuzzy sets are usually denoted by A. À maps elements of fuzzy set A toa 
real numbered value on the interval 0 to 1. If an element in the universe, say X, is 
a member of fuzzy set A, then this mapping is given by (x) e (0,1). The mapping 
is shown in Figures 2.1 and 2.2 for a typical fuzzy set [20]. 


True/Yes / Logic 1 | 


Is Ram Honest? 


Boolean Logic 


False / NO / Logic O | 


Extremely Honest (1) | 


Very Honest (0.8) Fuzzy Logic 


Sometime Honest (0.4) 
Extremely Dishonest (0) 


Is Ram Honest? > 


FIGURE 2.1 
Examples of fuzzy set and Boolean set. 


HA 


A 


FIGURE 2.2 
Mapping of a fuzzy set. 
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A fuzzy set with discrete and finite universe of discontinuous (X) is denoted by 


pO Ha(x1) A Ha(x2) "i Ha(x3) ye Haci) 
x1 x2 x3 xi 


When the universe of discourse is (X) is continuous and infinite, fuzzy set A is denoted 
by re Ha(xi) 


xi 


2.4.2 Operation of Fuzzy Sets [21, 22] 
2.4.2.1 Union 


Let A & B be the two fuzzy sets on the universe X. The union between the sets is denoted 
by Haun (x) = ua (x)V un (x) 


- max ( ua (x), us (x)) 
Let A = {(0.1, 1), (0.5, 10), (1, 50)} 
B = {(1,1),(0.2,10),(0.8,50)} 


AB 2 max(A,B) 
= {(1,1),(0.5,10),(1,50)} 


2.4.2.2 Intersection 


Let A&B be the two fuzzy sets on the universe X. The union between the sets is denoted 
by 


HAnB (x) = HA (X) ^ us (x) 
= min (a (x), us (x) 


Let A = {(0.1, 1), (0.5, 10), (1, 50)} 
B ={(1,1),(0.2,10),(0.8,50)} 
AUB = min(A,B) 


= ((0.1,1),(0.2,10),(0.8,50)) 
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2.4.2.3 Complement 


The complement of fuzzy set A can be expressed as u4 (A) =1- (A) 
Let A = ((0.1, 1), (0.5, 10), (1, 50)) 
Complement of A = A = {(0.9, 1), (0.5, 10), (0, 50)} 

2.4.2.4 Difference 


The difference of a fuzzy set A with respect to B, denoted by A |B, is the intersection of A 
and complement of B. It can be expressed as 


Hag (X) = Ma (x) Ome (x) 


= min (ua (x), usq) 


2.4.3 Properties of Fuzzy Sets 


With the exception of the middle rules that are not included, the fuzzy set's attributes are 
identical to those of crisp sets. As a result of the possibility of overlap between fuzzy sets 
and their complements, the excluded middle law does not apply to fuzzy sets [23-25]. 


1. Commutivity: AUB=BUĂ 
AUB=BUA 
2. Associativity: Au(BuC)= (4 UB Joc 


3. Distributivity: AU 


4. Idempotency 
5. Identity Aug=A&ANX=A 


6. Transitivity: If A<B<C then Á < Č 


M_” 


In this case the symbol “< “means contained in or equivalent to and "«" means 
contained in. 


7. Involution: A=A 
8. Excluded Middle Law: Does not hold 
AVA zu 


Law of contradiction does not satisfy: 


A 
9. De Morgan's Law: ( ANB y = AUB 
l 
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PROBLEM 
Let the fuzzy sets A and B be defined as follows: 


à-|1408,08,02] &B- [08,07 02,04] 
2 3 4 5 2 3 4 5 


z Calculate the union, intersection, complement, and difference 


Solution 
Union 
AU B = max (A,B) 1 i 0.7 0.3 04 
3 4 5 
Intersection 
Án B - min(A,B)- 05,05 02 02 
2 3 4 5 
Complement 
d cs 0,05, 07 08 
2 3 4 5 
S552] 05. 02908 05 
2 3 4 5 
Difference 
A|B = A|B=min(A,B)= 0.5 $ 0.3 y 0.3 x 0.2 
2 3 4 5 
PROBLEM 


Consider two fuzzy sets X and Y given by 


A [01,05,2,06,04] 
E i 7 


Evaluate: (a) X U Y,(b) X ^ Y, © Š, (d) X UY, (e) X v Y 


Solution 


(a) XuY | 
4 2 7 9 1 
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>] 


(c) 


pop Ol 04.05 02 1 
4 q^ AE aa 


(d) Koma Gne [ 91,05, 1,06,1] 


4 2.79. 


Y 21. Y- 0.9 0.5 0 04 0.6 
4 2 Ww 9 1 


Hence, X UY = max [XY ) - 0.9 | 0.6 | 0.5 0.8 | 0.6 
4 2 7 9 1 


2.4.4 Comparison between Crisp Set or Classical Set and Fuzzy Set 
1. In contrast to fuzzy sets, which define values from 0 to 1, crisp sets define values 
as being between 0 and 1 (i.e., yes for 1 and no for 0). 
2. Crisp sets have precise properties; fuzzy sets have imprecise properties. 


3. In a crisp set items have a full membership but in fuzzy sets they have a partial 
membership. 


4. The laws of excluded middle and noncontradiction hold for crisp sets, but in the 
case of fuzzy sets they do not hold. 


5. A crisp set is similar to Boolean logic (either 0 or 1), but fuzzy sets capture the 
degree to which something is true. 


6. In a crisp set with crisp boundaries, there is no confusion as to where the set bor- 
ders are located, while in a fuzzy set with ambiguous bounds, there is uncertainty 
as to where the set limits are located [26]. 


2.4.5 Composition of Fuzzy Set 


1. Max-Min Composition 
Two fuzzy relations R and S are defined on sets A, B, and C; then 


R<AxB,S<BxC 


Then composition R.S = Relation from A to C 


ur (x,Z)= uns (x,2)= max| min m (x,y), un (y,2)] 


Urs =al un (y,z) A us (x,y) | 
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2. Max-Product Composition 
The max product of R(x, y) and S(y, z) 


ur (x,z) = Urs (x, z) = max | min un (X, y), us (y.z)] 


HRS = Ay HR (x,y)^ Hs (y,z) ] 


2.4.6 Properties of Fuzzy Composition 


1. R5+5R 
(RSS 
3. (RS ).M = R.(S.M) 


Equivalence relation: Let the relation R be an equivalence relation if the following these 
properties are satisfied 

1. Reflexibility: up (x;x;) =1 for (x;,x;) e R 

2. Symmetry: ur (xix;) = Ur (x,x;) or ur (xixi) e R or (x;,x;) eR 

3. Transitivity: ur (xix) =A, & un (xix) = 22 SO un (xix) =A 


Then 4 = min [21,42]. 


2.4.7 Classical Tolerance Relation 


The tolerance relation R; on universe X is one where only the properties of reflexivity and 
symmetry are satisfied, otherwise called a proximity relation [22, 25]. 

RI =R;¡.R¡.Ry...R¡ = R (equivalence relation) 

[Ri = fuzzy tolerance relation] 


2.4.8 Features of Membership Function 


A membership function identifies the degree of truthfulness. The membership function for 
any crisp set can be representing by either 1 or 0. The membership function also is defined 
by the function whose value remains between 0 and 1 [26]. 


2.4.8.1 Fuzzy Set 


Extension set of classical set is fuzzy set, A = (x, u4(x) Ix e X] 
Where X is a universe of discourse and ua(x) is the membership function of x 
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2.4.8.2 Features of Fuzzy Sets 


Boundary 


Supports 


FIGURE 2.3 
Basic parameters of a membership function [7]. 


(a 
(b 


Core: these elements of “x” of the universe such that ua(x) = 1 
Support: These elements of “x” of the universe such that ua(x) > 0 


(c) Boundary: Comprise of those elements such that 0 < u4(x) < 1 


(d) Core = boundary-support 


2.4.8.3 Classification of Fuzzy Sets 


= 


. Normal fuzzy set 


MAI 


Fuzzy set with at least one “x” element whose membership value is unity (Figure 2.4). 


. Subnormal fuzzy set 


In this fuzzy set no membership function is equal to unity (Figure 2.5). 


. Convex fuzzy set 


In this type of fuzzy set the membership function is monotonically increasing and 
later monotonically decreasing for elements x, y, & z (Figure 2.6). 


pa(x)> min| a (x), ua (z)] 


. Non convex fuzzy set 


It is just opposite to the convex fuzzy set. It is not follows the one particular pat- 
tern (monotonically increasing & monotonically decreasing membership function) 
(Figure 2.7). 
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FIGURE 2.4 
Simple diagram of nominal fuzzy set [7]. 


FIGURE 2.5 
Simple diagram of subnormal fuzzy set [7]. 


5. Crossover point 
When membership value for any value of "x" is 0.5 then it is called crossover point 
membership function (Figure 2.8). 

6. Height of a fuzzy set 
Maximum value of the membership function for any values of x is the known height 
of a fuzzy set, where Height u4 (x) = max [ua (x)] (Figure 2.9). 
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FIGURE 2.6 
Simple diagram of convex fuzzy set [7]. 


FIGURE 2.7 
Simple diagram of non convex fuzzy set [7]. 


FIGURE 2.8 
Simple diagram of crossover fuzzy set [7]. 
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FIGURE 2.9 
Simple diagram of height of a fuzzy set [7]. 


2.5 Fuzzification 


Fuzzification is the procedure used to change a crisp value into a fuzzy value or a crisp 
value (existing value) into a linguistic value [27-29]. Take an example where we want to 
convert the crisp temperature 10?C into a fuzzy value (Figure 2.10) 

Methods of membership value assignment: 


. Intitution 

. Inference 

. Rank ordering 

. Angular fuzzy sets 
. Neural network 

. Genetic algorithm 


ND 0 FP WN 


. Inductive reasoning 


2.5.1 Institution 
Design the membership by humans’ own intelligence (Figure 2.11). 


Crisp quantity Fuzzy value/ Linguistic value 


(Cool or warm) 


FIGURE 2.10 
Simple diagram of fuzzification [7]. 


Practical Approach of Fuzzy Logic Controller 29 


Very 
Unimportant Ordinary Important Important 


Very 
unimportant 


FIGURE 2.11 
Simple diagram of fuzzy institution [8]. 


2.5.2 Inference 


Inference uses knowledge to perform reasoning. Knowledge of geometrical shapes mem- 
bership values includes triangular, trapezoidal, bell shaped, and Gaussian as several com- 
mon membership functions. Let A, B, C be the interior angle of triangle A > B > C > 0 and 
A+B+C= 180°. 

Here we are defining five types of triangles: 


1. R= Approximately right angle triangle 

2. I = Approximately isosceles triangle 

3. E- Approximately equilateral triangle 

4. LR = Approximately isosceles and right angle triangle 
5. T = Other types of triangle 


1. Membership value of right angle triangles: 
1 
A,B,C)=1-—[A-90 
ua (4,B,C)=1-J4-90 


Example: y = (80, 65, 35) 


1 
A,B,C)=1-— A-90 
m (4,B,C)=1- [A - 90 


1-2 80-90|=1 
90 


O| œ 


2. Membership values of isosceles triangles: 
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Let, u(A,B,C) = {80,65,35} 
Then u (A,B,C)= 1 - a min! (80 - 65),(65-30)) 


=1- min (15,30) 


60 
1 3 
A,B,C)-1-—*15-— 
PAP IE 
3. Membership value of equivalent triangles: 
in ABO ASC 
180 
Let, y (A, B, C) = (80, 65, 35} 
uz (A,B,C) =1-—1_|g0 -35 
180 
1 : *45=1 Elo 
180 4 4 


4. Membership value of isosceles and right angle triangles: 


IR=I5R 
pun (A,B,C) = min (11 (A,B,C), ur (A, B,C))) 


Let, u (A, B, C) = (80, 65, 35) 


HIR (A,B,C) = min (ur, Ur} 


E 4 3 
= min| >, = 
4 9 4 


5. Membership value of other types of triangle: 


T=(RUIVE) 
T-R'nInE 


Let, u (A, B, C) = (80, 65, 35); 


8 3 3 
ao yp = & u= 
i ui Vaii al a? 
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ur = RE AL NE == 


i 
9 


2.5.3 Rank Ordering 


By using a rank ordering procedure, the polling notion is employed to assign membership 
value (pairwise preference). From the ordering, membership is performed. 


Name of the Car BMW Benz Jaguar Audi Total % 

BMW = 51 54 52 157 26.03 
Benz 48 47 84 179 32.66 
Jaguar 46 62 14 122 20.33 
Audi 45 59 47 145 24.04 


Based on the percentage, membership values are assigned (Figure 2.12). 


2.5.4 Angular Fuzzy Sets 


These are different from standard fuzzy sets in this co-ordinate description: They depend 
upon the universe of angles repeating the shapes every 27. 
Example: pH value of water sample from contaminated pond. 


a. If pH is7, itis a neutral solution. 

b. A pH between 7 to 14 means absolute basic (AB) (=) very basic (VB) (=) basic (B) 
=) and medium basic (MB) =) 

c. If pH is between 7 and neutral (0), it is medium acidic (MA) (- z) very acidic (VA) 

(horiaca) 
gy OF acidic 4) 

d. Membership function M+(0) = t tan (0). 


Membership value 
= N 
o o 


o 


BMW Benz Jaguar Audi 


FIGURE 2.12 
Simple diagram of rank ordering membership function [8]. 
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2.5.5 Neural Network 


Here, Ra, Rg & Rc are the data classifier in region [like red, yellow, and black balls]. x, y are 
the data points (data coordination) (Figure 2.13). 

From the diagram it is shown that data points lie in the Rc. 100 data points lies in Ry and 
so forth. 


2.5.5.1 A Training the Neural Network 


All the data points of (xı, x») training and all possible combinations of Ra, Rg, and Rc are 
stored in the neural network (Figure 2.14). 


FIGURE 2.13 
Simple diagram of ANN model [8]. 


Neural Network R 
B 


1 2 
s 
x, 08 [01 | 
FIGURE 2.14 
Simple diagrams for training the neural network model [8]. 
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2.5.5.2 Testing the Neural Network 


When new data points come (which are not used to train the model) the neural network 
compares it with the train model from the available database to give the approximate out- 
put 30-33 (Figure 2.15). 


2.5.6 Genetic Algorithm 


GA used to determine the fuzzy membership function. Base on Darwin's idea of evalua- 
tion of "survival of the fittest." Procedure and methods [30]: 


1. The membership function are called into bit strings. 

2. The bit strings are calculated together. 

3. The fitness function is used to evaluate each set of membership functions. 
4. The process is carried out until convergence is achieved. 


2.5.7 Inductive Reasoning 


Logical thinking, observation, and experiment sum up to indicate the conclusion of 
the membership function. As an example, during sunny weather, you don't need an 
umbrella. 


New data input x 


Neural network 


Fuzzy membership value (0, 1) 


FIGURE 2.15 
Simple diagrams for testing the neural network model [8]. 


2.6 Defuzzification 


Fuzzy input =) Defuzzification p Crisp quantity 


FIGURE 2.16 
Simply diagram of defuzzification [8]. 
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2.6.1 Lambda Cut for Fuzzy Sets/Alpha Cut 


Defuzzification is the process where we can convert the fuzzy value into the crisp value or 
fuzzy set into crisp value or fuzzy matrix into a single value [30-32]. 

Defuzzification can be done by the lambda cut method or alpha cut method. 

Condition 


A, 7 [x | ua (x) 2 A;Ae(0,1) 


a) Strong 4 cut defuzzification: When ua(x) >A 
b) Weak 4 cut defuzzification: When ua(x) 2 A 


Properties of 4 cut set 


(1)| Au J = A, UB, 


A 


(2) AB) = A, MB, 


A 


Vv 


(3)| A + (A J, | except. 2 0.5 | 


Lambda cut for fuzzy relation 
Let R be a fuzzy relation 


R, = (oy) lur(x,y)> 2) 


For two fuzzy relations Rand S 


(1) RUS | =R US, 


pă 


(2) ROS) =R AS, 


x 


(3) 


| + (S ys [ except A = 0.5] 


PROBLEM 


Consider a fuzzy set 


Y E 0.6 05 07 
A= + 
xi XQ X3 X4 


lau =( X1,X2,X3,X4) 


Find the crisp set value membership value whose value is greater than or equal to 0.5. 
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Solution: 
Aos E X2,X3,X4) 
Aos =} X2,X4) 
Ao; = {xa} 
Defuzzification method [5, 28, 29, 33] 


. Max membership method 

. Centroid method 

. Weighted average method 

. Mean-max membership 

. Center of sums 

. Center of largest area 

. First of maxima, last of maxima 


ND OP WON FH 


2.6.2 Max Membership Principle 


FIGURE 2.17 
Simple diagram of max membership method [8]. 


This is the height method 
Maximum membership value 

z* = defuzzified value or crisp value 
. pal) > ua (z) for all ze A 

Peaked output function 


anoop 


There are three membership function 0.3, 0.5, and 1 and z* = 1 is the maximum crisp 
value [34]. 
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FIGURE 2.18 
Max membership defuzzification methods [9]. 


2.6.3 Centroid Method 


Centroid membership function means center of mass, center of area, or center of gravity [35]. 


ut | ua.zdz 
lua dz 


FIGURE 2.19 
Centroid membership defuzzification methods [9]. 
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2.6.4 Weighted Average Method 


This is true for symmetric output membership functions. Each membership function is 
weighed based on its maximum membership value [30]. 


Let there be three membership functions with center values a, b, and c. Then member- 
ship value 


z (0.3*a+0.5*b+1**c) 
|». (0340541) 


0 3 5 7 8 


FIGURE 2.20 
Weighted membership defuzzification method [9]. 


2.6.5 Mean-Max Membership or Middle of Maxima 


The center of the maximum is another name for this approach [32]. With the exception of 
the possibility of non-unique maximum membership regions, this is closely comparable to 
the max membership approach. The output of the resulting membership function is given 


by 


This type of membership function is only applicable for a symmetrical membership 
function. 
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FIGURE 2.21 
Mean max membership defuzzification method [9]. 


2.6.6 Center of Sum Methods 


Instead of using their union, this approach uses the algebraic sum of each fuzzy subset. 
The crossing regions are added twice, which is the biggest disadvantage despite the fact 
that the computations are done quite quickly. The defuzzified value z' is given by 


n 
> Zi Ag 
* i-1 


ES 
n 
A Aci 
i-1 


FIGURE 2.22 
Center of sum membership defuzzification method [9]. 
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For fuzzy set 1: 


id (1+5) = 
2 
ma POS aie 
Fuzzy set 2: 
T (347) s 
2 7 
dtes (2+4)*1 -3 
2 
UA: oe 


(0.6+3) 


2.6.7 Center of Largest Area 


If the fuzzy set comprises several subregions, the center of biggest area approach may be 
used to identify the subregion with the largest area. 


ze Íuezdz (1154125). 


12 
[u..dz 2 


Problem Calculate the different defuzzification membership values for the following 
Figure 2.24. 


0.5 


0 2 4 2 6 8 10 14 


FIGURE 2.23 
Center of largest area of membership defuzzification method [9]. 
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0.5 
: AN ass 
1 3 5 7 8 
FIGURE 2.24 


Union of three fuzzy set diagrams [9]. 


a! 
2.7 Examples for Different Defuzzification Methods 
2.7.1 Max Membership Method 


From the diagram it is seen that the maximum membership value is 1, and the equivalent 
scalar value is 1 where z*=6,7 


2.7.2 Centroid Method 


Finding the region limited by the union of the three sets and determining its centroid— 
which will serve as the defuzzified value. 


Sub area 1: Area = pM = 0.150, z = 0.67, Area * z = 0.100 
Sub area 2: Area = 3*0.3 = 0.90 Z = 2.5 Area * z = 2.250 
Sub area 3: Area = Hed = 0.04 Z = 3.73 Area * Z = 0.149 
Sub area 4: Area = 2*0.5 = 1.00 z = 1.00 Area * z = 5.00 
0.5* 0.5 = = 
Sub area 5: Area = = 0.125 z = 5.87 Area * Z = 7.330 
Sub area 6: Area = 1*1 = 1 Z = 6.50 Area * Z = 6.50 
Sub area 7: Area = = = 0.50 Z = 7.83 Area * z = 3.660 


Total area = 3.715 & > Z. Area = 24.989 
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2.7.3 Weighted Average Method 


In this instance, the mean is multiplied by the degree of membership of each fuzzy set to 
get the Centre, and the average of all sets is then determined. 

The membership value of first fuzzy (trapezoidal diagram) at 2.5 is 0.3. The membership 
value of second fuzzy (trapezoidal diagram) at 0.5 is 0.5 & the membership value of first 
fuzzy (trapezoidal diagram) at 6.5 is 1 


So, ze — (25*0.3+0.5+1+6.5*1) 
INIT (0.3+0.5+1) 


=5.146 


2.7.4 Mean Max Membership 


Here, the average of the range with the highest membership value. The maximum mem- 


bership value in this case is 1 for the range (6, 7). Therefore z' = d =6.5 


2.7.5 Center of Sums 
In this approach, we first identify the centre of the region by adding the areas of each fuzzy 
set. 


1.2 


Area of the first fuzzy set — ME - 


COMO ci 


Area of the second fuzzy set = 


1)*1 
Area of the third fuzzy set = LE =2 


_ [(12*25)«(15*5)«(2*65)] E 


So, z' 
id (1241542) 


2.7.6 Center of Largest Area 


According to this procedure, the mean value of the fuzzy set with the greatest or maximum 
area is called the defuzzified value. From the diagram it is seen that the third fuzzy set has 


y 
the largest area. Therefore z' — iu = 6.5. 


a ooo 
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A Practical Approach to Neural Network Models 


3.1 Introduction 


An effective and parallel distributed computing system called ANN borrows the analogy 
of biological neural networks for its main idea [1, 2]. It acquires a significant collection of 
units that are connected in some way in order to facilitate inter-unit communication. These 
parts, sometimes called nodes or neurons, are simple parallel processors [3]. Each neuron 
is connected to other neurons via a connection. A weight with knowledge of the input sig- 
nal is connected to each connecting link. Since the weight often stimulates or inhibits the 
information being transmitted, this knowledge is particularly useful for helping neurons 
solve a particular problem [4]. The intrinsic state of each neuron is described by an activa- 
tion signal [5]. A combination of the input signals and the activation rule results in output 
signals that may be delivered to other components (Figure 3.1). 

The net input for the aforementioned generic artificial neural network model may be 
approximated: 


Yin = WX + WX2 + U3X3 +... + WnXn 
3^3 
n 
That's the net input Y; = > X¡W; 
i=1 


Applying the activation function to the net input enables the determination of the 


output. 
Y =$ (Yi) 


Output = function (net input calculated) 
The three building pieces that make up ANN processing are as follows: 


* Network topography 
* Weight or learning adjustments 
* Activation techniques 


3.1.1 Network Topology 


This is an arrangement of a network which indicates how many different ways nodes 
or connecting lines can be arranged. According to the network topology, ANN can be 
classified into main two categories as shown in Figure 3.2. 
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FIGURE 3.1 
Typical architecture of ANN [3]. 


ANN based on Network Topology 


i 
i i 


| Feedforward Based Feedback Based 


E 


Multi-layer Feedforward 


y i i 


Single layer Feedforward 


Multilayer Recurrent network 


| Single layer Feedback Single layer recurrent network 


FIGURE 3.2 
Types of ANN model based on connecting nodes. 


3.1.1.1 Feed Forward Network 


The term "feed forward layer" refers to a non-recurrent network containing layers of 
processing units and nodes, each of which is connected to thenodes of the previous layers [5]. 
There are various weights attached to the connection. The signal can be unidirectional, 
from input to output, because there is no feedback loop. It may be divided into the follow- 
ing two types: multilayer feed forward and single layer feed forward network. 


Single layer feed forward network 
A single layer feed forward ANN has only a single weighted layer, or the input layer 
and output layer are both fully interconnected, as shown in Figure 3.3. 


Multilayer feed forward network 


In a multilayer feed forward ANN architecture there are multiple weighted layers, 
and between the input and output layers there will be one or more hidden layers 
available as shown in Figure 3.4. 
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Output layer 


Input layer 


FIGURE 3.3 
Basic diagrams of single layer feed forward network. 


input 
Hidden layer 


FIGURE. 3.4 
Basic diagrams of multilayer feed forward network [5]. 


3.1.1.2 Feedback Network 


A feedback network, as its title implies, contains feedback pathways, permitting the signal 
to go through loops in both directions. As a result, it is a nonlinear dynamic system that 
evolves continually until it achieves equilibrium [6, 7]. Unlike feed forward, a feedback 
network may be classified into the following categories. 


Single node with its own feedback 

When outputs may be utilized as inputs for nodes in the same layer or a layer above, 
single node feedback ANN feedback networks arise. Recurrent networks are closed- 
loop feedback networks [6]. Figure 3.5 represents a single neuron in a single recurrent 
network that receives feedback from another neuron. 
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Input 


Output 


rd 


Feedback 


FIGURE 3.5 
Basic diagrams of single nodes with their own feedback network. 


FIGURE 3.6 
Basic diagrams of single layer recurrent networks. 


Single layer recurrent network 

A feedback link is formed in a single layer ANN in which a processing unit's output 
can be directed back to that processing element, to another processing element, or 
to both, as illustrated in Figure 3.6. An ANN belonging to the class of recurrent 
neural networks has connections between nodes that form directed graphs along a 
sequence [6-8]. It may therefore show dynamic temporal behavior for a time series. 
RNNs may use their memory to process input sequences, unlike a feed forward 
ANN. 


Multilayer recurrent network 

A multilayer recurrent network has an internal, hidden layer that is not in direct 
communication with the exterior layer as shown in Figure 3.7. The network be- 
comes more computationally robust when one or more hidden layers are pres- 
ent. The model's outputs cannot be fed back into it because there are no feedback 
connections. In every step it does not need to calculate the value of the input. The 
main advantage of RNN is that the hidden layer captures the information in a 
sequential way. 
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FIGURE 3.7 
Basic diagram of a multilayer recurrent network. 


3.1.2 Adjustments of Weights or Learning 


Learning is an important parameter in artificial neural network which helps to modify the 
weights of the connection made inside a particular network's neurons. Based on learning 
methods, it may be divided into three groups in the ANN, namely supervised learning, 
unsupervised learning, and reinforcement learning as shown in Figure 3.8. 


3.1.2.1 Supervised Learning 


This form of algorithm for learning is carried out under the supervision of a teacher, as the 
name suggests. It depends on this learning method [9-11]. When an ANN is being trained 
using supervised learning, the input vector is sent to the network, and the network will 
produce an output vector as shown in Figure 3.9. The desired output vector is contrasted 


Classification of ANN based on adjusments of weight and 
learning 


Y r — _oy 
Supervised learning Unsupervised learning Reinforcement learning 


FIGURE 3.8 
Classification of ANN model based on learning and weight. 


| x 
nput (X) Actual output (D) 


Neural network 


Error signal 
generator 


Desired output (Y) 


FIGURE 3.9 
Basic diagrams of supervised learning. 
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with this one. An erroneous signal is produced if the planned output vector and the 
corresponding data differ. For as long as the expected output is not yet met by the actual 
output, the weights are adjusted depending on this error signal. Figure 3.9 represents the 
simple diagram of supervised learning. 


3.1.2.2 Unsupervised Learning 


Unsupervised learning enables the learning of datasets without the constant monitoring 
of a professional [12, 13]; the whole process is autonomous and is depicted in Figure 3.10. 
By mixing input vectors of the same kind, clusters are produced when ANNs are being 
trained via unsupervised learning. The neural network interacts to a new input pattern by 
identifying the class to which the pattern belongs in a simulation result. Figure 3.10 illus- 
trates that in unsupervised learning the environment doesn't give feedback on whether the 
planned result was achieved or not. As a result, in this sort of learning, the network itself 
updates its weighted value and identifies the patterns [14, 15], characteristics, and relation- 
ships between the input and output data. 


3.1.2.3 Reinforcement Learning 


In reinforcement learning ANN, a learning algorithm is employed to enhance or reinforce 
the network in relation to some critical knowledge [16, 17]. The learning topology of this 
ANN is quite similar to the supervised learning. The network gets some feedback from the 
environment as it is being trained using reinforcement learning as shown in Figure 3.11. 
A feedback network provides numerical information instead of instructive information. 
The network adjusts the weights after obtaining input in order to receive better criticism 
in the future [18, 19]. 


3.1.3 Activation Functions 


The activation function chooses whether or not to stimulate a neuron by calculating a 
weighted sum and adding bias to it. Adding nonlinearity to a neuron's output is the goal 
of the activation function. We know that the weight, bias, and matching activation func- 
tions of neurons in neural networks determine how well they function. The output inac- 
curacy would be used for a neural network's neurons' weights and biases that can be 
modified. This process's official term is back-propagation. Back-propagation is made pos- 
sible by activation functions since they provide the gradients and error needed to update 
the weights and biases. 
The following list includes several interesting activation functions- 


Actual input (X) Actual output (Y) 


Neural network 


FIGURE 3.10 
Diagram of unsupervised learning. 
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Input (X) Actual output (D) 


Neural network 


Error signal 
generator 


Reinforcement signal (R) 


FIGURE 3.11 
Diagram of reinforcement learning. 


3.1.3.1 Type of Activation Function 


TABLE 3.1 
Characteristics of Different Types of Activation Function [20-22] 


Si No. Name of the Activation Function Characteristics 


1 Linear activation function: See Figure 3.12 
Linear activation function is similar to a straight line passing through the equation 
y = mx, which is identical to that of a straight line for an origin with a positive 
slope, as illustrated in Figure 3.12. Irrespective of the number of layers, assuming 
they are all linear in nature, the last layer's final activation function is little more 
than a straight function that can be entered from the first layer. Its values range 
from —a to +a. It is utilized at the output layer. Differentiating a linear function 
will cause the function to become constant and the outcome to no longer be 
dependent on the input ^x." 


2 Sigmoid activation function See Figure 3.13 
This activation function looks like an 'S' shaped graph and is mathematically 


1 
represented by f(x) = dez From the characteristics graph it is seen that the 


function is nonlinear in nature, and output values are very steep corresponding 
to x axis as shown in Figure 3.13. The magnitude of the y axis ranges from 0 to 1. 
Output of this function can easily predict 1 when its value exceeds 0 and 0.5. 


3 Tanh activation function See Figure 3.14 
Tanh activation function is sometimes also called the "always works better than 
sigmoid function" or the tangent hyperbolic function as shown in Figure 3.14. 
Actually, it's a shifted nonlinear variant of the sigmoid function. Both are 
comparable and derivable from one another. Mathematically it is represented 


by fía) = tanh(x) 


2 
ine 1. Output of this activation function ranges from 


-1 to +1. This greatly simplifies learning for the subsequent layer. 


(Continued) 
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TABLE 3.1 (Continued) 


Si No. Name of the Activation Function Characteristics 


4 ReLu activation function See Figure 3.15 
The rectified linear unit is referred to as ReLu. It is the activation function that is 
utilized the most, and is mostly used in the underlying layers of neural networks 
as shown in Figure 3.15. This nonlinear mathematical function can be represented 
by A(x) = max (0, x) and its output varies from [0, a]. Because ReLu uses fewer 
complicated mathematical processes than tanh and sigmoid, it requires less 
computing power. When just a few neurons are active at once, the network is 
sparse, which makes computation simple and effective. Compared to sigmoid 
and tanh functions, the ReLu activation function learns a lot more effectively. 


5 Softmax function See Figure 3.16 
The Softmax activation function is one kind of nonlinear sigmoid function which is 
quite handy during the classification problems shown in Figure 3.16. Its output 
squeeze for each class is between 0 and 1. The Softmax function is applicable 
at the classifier's output layer, which is where we are truly aiming to get the 
probabilities to categorize each input. 


FIGURE 3.12 
Characteristics graph for linear function. 


FIGURE 3.13 
Characteristics graph for sigmoid function. 
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FIGURE 3.14 
Characteristics graph of tanh activation function. 


FIGURE 3.15 
Characteristics graph for ReLu activation function. 


FIGURE 3.16 
Characteristics graph for Softmax activation function 
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3.1.4 Learning Rules in Neural Network 


In order to increase the performance of a neural network model, learning rules, also known 
as the learning process, enable updating the network's weights and bias levels while emu- 
lating a particular data pattern. During the simulation a neural network applying this 
iterative process the learning rule is an iterative process [23, 24]. A neural network can 
improve its operation better by learning from the current circumstances [25]. Let's examine 
various neural network learning rules as shown in Figure 3.17. 


3.1.4.1 Hebbian Learning Rule 


The Hebbian rule was the first learning rule, created in 1949 by Donald Hebb as a learning 
technique for an unsupervised neural network. It can help us figure out how to increase 
a network's node weights [26, 27]. The Hebb learning rule makes the assumption that 
two neighboring neurons must have simultaneous activation and deactivation. The weight 
linking these neurons should consequently get heavier. The weight between neurons 
should decrease for those that are active in the opposing phase. The weight shouldn't alter 
if there is no signal correlation. A significant positive weight occurs between the nodes 
when their inputs are both positive and negative. 

A significant negative weight exists between two nodes if one's input is positive while 
the input of the other is negative. All weights' values are zero at the beginning. It is possi- 
ble to apply this learning rule for both soft- and hard-activation purposes. This is the unsu- 
pervised learning rule since the learning process does not use neurons' desired responses. 
The fact that the weights' absolute amounts are frequently proportionate to learning time 
is not acceptable. 


Mathematical Formulation 
According to the Hebbian learning rule, in every step of the algorithm the connection 
weight is improved by applying this rule: 


Aw; (t) = ox (t).y; (t) 


Here, Aw;(t) = during time step t, the weight of the link rises 

a = the positive and constant learning rate 

x;(£)= the input value coming from the presynaptic neuron at time step t 
y;(t) = the output of presynaptic neurons at the same time step t 


Learning rules of NN model 


y 
y y Y y 
Hebbian rule (used Perceptron rule Delta learning rule Outstar 
to improve the (assigning random (synaptic weight is learning rule 
weights of node) value of each equal to the (nodes are 
weight) multiplication of arranged in a 
input vector & error) layer) 


Correlation rule 
(it's a supervised learning rule) 


FIGURE 3.17 
Types of learning rule of a NN model. 
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3.1.4.2 Perceptron Learning Rule 


As you may already be aware, a neural network's connections each have a weight that varies 
as the network learns. It asserts that to demonstrate supervised learning, the network starts 
its learning process by assigning random values to each weight [28-30]. Calculate the out- 
put value based on a group of records for which we are aware of the expected output value. 
Following that, the network compares the estimated output value to the anticipated value. 


Mathematical Formulation 

Assume that there are n finite input vectors, x(n), and a desired or goal output vector, T(n), 
where n = 1 to M, to understand its mathematical formulation. Now, as previously 
described, the output Y may be determined based on the net input, and the activation 
function that is applied over that net input can be written as follows: 


y=fYin) =1VYin>0 
=0, Vin< 0 


Where 8 is a threshold value and updating of the weight follows two different cases. 


Case 1. When T + Y, then Whew = Wola + EX 
Case 2. When T = Y, no change in weight 


3.1.4.3 Delta Learning Rule 


The delta ruleis one of the most popular learning principles, invented by Widrow and Hoff. 
It depends on supervised learning. According to this rule, the multiplication of the error 
and the input results in the alteration of a node's sympatric weight [31]. It is a supervised 
learning algorithm in the sense that the activation function is continuous. This rule is based 
on the neverending gradient-descent method. The synaptic weights are altered by the delta 
rule in terms of reducing the net input to the output unit and the goal value. 


Mathematical Formulation 
The delta rule offered the updated the synaptic weights is given by 


AU; = Q.Xjej 


Here, 
Aw; = weight change for ith pattern 
a = the positive and constant learning rate 
x; = the input value from pre-synaptic neuron 
e; = (t-Yin) = the difference between the desired /target output and the actual output y;, 


This delta rule is for a single output unit only; updating of the weight can be done by the 
following ways. 


Case 1. When f + y, then t, = waa + Aw. 
Case 2. When t = y, then no change in weight. 
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3.1.4.4 Competitive Learning Rule (Winner-takes-all) 


There are certain unsupervised neural networks where the output nodes compete with 
one another to reflect the input pattern. We need to comprehend the competitive network 
in order to comprehend this learning rule [32]. A single layer feed forward network con- 
nected with feedback connections between outputs makes up this network. The competi- 
tors never sustain themselves because of the inhibitory type of linkages between outputs. 
The output nodes will compete with one another. The key idea is that the output unit 
that exhibits the maximum level of activation in response to a certain input pattern will 
be sorted the vector throughout training, due to the fact that just the winning neuron is 
altered whereas the losing neurons are kept intact; this rule is also known as the winner- 
takes-all rule. 


Mathematical formulation 
The three crucial elements for the mathematical formulation of this learning rule are as 
follows: 


Condition to be a winner: A neuron y; considered as winner should satisfy following 
condition: 
for all values of j, j + k 


yy = Lif v > vj = 0, otherwise 


This indicates y; wins the competition when its induced local field, say v;, is the big- 
gest among the network's other neurons. 


Condition of sum total of weight: Another condition of this learning rule is alge- 
braic. The cumulative weights assigned to a certain output neuron should be 1. 


ws =1forall k 
j 


Change of weight for winner: When a neuron is dormant corresponding to the input 
pattern, it means learning rules didn't execute but for a wining neuron the corre- 
sponding output node is updated by the learning rule. 


AU = —a (xj E Wy ) if neuron k wins 


=0, if neurons k losses 


Where a is learning rate. 


3.1.4.5 Outstar Learning Rule 
As a result of known output in supervised learning, the Outstar learning rule was intro- 
duced by Grossberg. This rule is applied over the neurons grouped in a layer [33]. The 
layer of p neurons is intended specifically to provide the required output. 

Mathematical formulation: The final weight achieved by the rule is given by 


AU; =a (d-w,) 


Here d is the desired neuron output and a is the learning rate. 
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3.1.5 Mcculloch Pitts Neuron 


This is the first mathematical model of a biological neuron invented by Warren Mcculloch 
and Walter Pitts in 1943. It has a linear threshold gate model and basic building blocks of a 
neural network [34, 35]. A directed weight graph used for connecting the neurons is shown 
in Figure 3.18. It has two possible status of neurons: active (logic 1) and silent (for logic 0). 


Architecture 

Overall operation of this network segments into two parts. In the first part I takes inputs, 
which multiply with individual weights and finally produce the output y. In second stage 
output indicates the two level decisions. For an example, to watch cricket or not on a TV is 
a binary input (0,1), and the response is also binary, that is, 1 for will watch cricket and 0 
for won't watch it. 


Concept/Condition: 
Let threshold be denoted by T; then 
Y=1; when X>T 
-0; when X<T 


Where X = wl, + Wala wal; +...+ Waly 


Bias/Threshold: 
This is the minimum value of weighted active input for a neuron to fire. If effective input 
(X) is larger than threshold value T, then output (Y) = 1, otherwise output (Y) =0. 


Output = fía) [ where a = Y wi, -T and function f where 0(1) = ES ud E 
i=1 =UUX< 


3.1.6 Simple Neural Nets for Pattern Classification 


The simplest architecture of neural network performs the pattern classification consists of 
a layer of inputs and a single output unit [36, 37]. Most of the neural net uses single layer 
architecture for pattern recognition is shown in Figure 3.19 & simplified neural network 
in Figure 3.20. 


n 


net =b+ X zwi (3.1) 


i-l 


I 


Output (y) 


FIGURE 3.18 
Architecture of Mcculloch Pitts neuron. 
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w2 


Wn 


FIGURE 3.19 
Single layer neural networks for pattern classification. 


© ~ 
— 
O " d 
FIGURE 3.20 

Simple neural networks. 


Biases and Threshold: 

A bias functions similarly to a weight on a connection from a unit whose activation is 
constant at 1. Improvements in the bias value correspond to rises in the net input value. In 
the event that a bias value is provided, the activation function is represented as 


finet) =1ifnet>0 
= —1 ifnet<0 


If someone does not use the bias weight, then activation function can be expressed as 


foet) =1ifnet>0 
= -lifnet<0 


Net 0 d XjUi 
i=1 
Role of bias threshold: Here, we distinguish between areas of the input space with a 
positive net reaction and those with a negative net output. 


Net = b+ U1X1 + W2X2 
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The separation line separates the values of x; and x; for which the net responds positively 
from the values for which it responds negatively. 


b + wx, + U»X». =0 


701 b 
Xo = == x 
10) W2 


To ensure that the net responds correctly to the training data, the values of w;, wz, and b 
are established throughout the training process. 


3.1.7 Linear Reparability 


Two classes of patterns are considered to be linearly separable when they can be divided 
by a decision boundary and are represented by a linear equation. 

In the previous figure in the (x;, x2) plane these two classes (x1, x2) can be separated by a 
single line L. They are known as a linearly separable pattern (Figure 3.21). 


X4 A Input (x4) 


Input (x; ) X3 


FIGURE 3.21 
Diagram of linear separability [15]. 


Linear separability for AND problem (Figure 3.22) 


A B Y 
-1 -1 -1 
-1 1 -1 
1 -1 -1 
1 1 1 
A Class 1 
o 
=Y=1 sí = Class 1 (when Y = 1) 
© = Class 2 (when Y =-1) 
a (e) 
B so, AND is linear separable 
Class 2 


FIGURE 3.22 
Linear separability for AND operation [15]. 
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Linear separability for OR problem (Figure 3.23) 


A 
X x Class 1 * = Class 1 (when Y = 1) 
O = Class 2 (when Y =-1) 
o B So OR is linear separable 
Class 2 A 


FIGURE 3.23 
Linear separability for OR operation [16]. 


Linear separability for XOR problem (Figure 3.24) 


FIGURE 3.24 
Linear separability for XOR gate [16]. 


So XOR logic is linear separable 


3.1.8 Perceptron 


This was developed by Rosenbalt utilizing the Mcculloch and Pitts model concept. The 
fundamental building block of an artificial neural network is the perceptron. It divides the 
labeled data into two classes. 
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— Y Activation function output 


FIGURE 3.25 
Typical architecture of perceptron. 


Inputs wa » 
X2 — > 


Characteristics of operation 

It is made up of a single neuron with an unrestricted number of inputs and weight 
adjustments; however, the neuron’s output can be either 1 or 0, depending on the threshold. 
This neuron’s bias weight value is 1. 


Architecture 
The simplified diagram is shown in Figure 3.25, and a description of each unit is explained 
in the next subsection. 


Basic elements of perceptron 


Links: It would contain a number of interconnections, each of which has a weight, 
with the bias generally carrying a weight of 1. 


Adder: After the input has been multiplied by the appropriate weights, the adder 
adds the input. 


Activation function: This limits the neuron’s output in the activation function. The 
Heaviside step function, which has two possible outputs, is the most fundamental 
activation function. If the input is positive, this method returns 1, and if it’s nega- 
tive, it returns 0. 


3.2 Adaptive Linear Neuron (Adaline) 


A network with only one linear unit is called Adaline, which stands for adaptable linear 
neuron. A bipolar activation function is used [38, 39]. To reduce the mean squared error 
(MSE) between the actual output and the desired output/target output, it applies the delta 
rule during training. Adjustable weights and bias are available. Adaline’s fundamental 
design resembles a perceptron and includes an additional feedback loop that allows output 
to be compared to a desired or goal output [40]. The weights and bias will be modified 
following the comparison using the training process. The simplified diagram of Adaline is 
shown in Figure 3.26. 
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Comparison with targetoutput 


FIGURE 3.26 
Basic diagram of adaline. 


3.2.1 Multiple Adaptive Linear Neurons (Madaline) 


A network made up of several Adalines parallel to one another is called Madaline, which 
stands for multiple adaptive linear neurons. It will only have one output [41, 42]. Adaline 
will function as a hidden unit between the input and Madaline layer, similar to a mul- 
tilayer preceptor. The input and Adaline layers' weight and bias, as seen in the Adaline 
architecture, are both programmable. The constant bias and weights of the Madaline and 
Adaline layers are each 1. Delta rule can be used to aid with training. One neuron from the 
Madaline layer and n neurons from the Adaline layer make up the Madaline architecture. 
Since it is situated between the output layer and the input layer, or the Madaline layer, the 
Adaline layer may be thought of as being a part of the hidden layer. 


Architecture: 

The n input layer neurons, m Adaline layer neurons, and one Madaline layer neuron make 
up the Madaline architecture. The Adaline layer, also known as the concealed layer, is 
positioned between the input layer and the output layer, or the Madaline layer. The 
simplified diagram of Madaline is shown in Figure 3.27. 


FIGURE 3.27 
Basic architecture of Madaline. 
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3.2.2 Associative Memory Network 


Based on the idea of pattern linkage, these neural networks are able to store a wide vari- 
ety of patterns, and when producing an output, they choose one of the stored patterns by 
comparing it to the input pattern [43, 44]. The term "content-addressable memory" (CAM) 
is also used to describe these memories. With the stored patterns acting as data files, asso- 
ciative memory conducts a parallel search. 

The two categories of associative memories that we can see are as follows: 


* Auto associative memory 
* Hetero associative memory 


3.2.3 Auto Associative Memory 


This neural network only has one layer, and both the input training vector and the output 
target vector are identical. In order for the network to hold a collection of patterns, the 
weights are chosen. 


Architecture 
The architecture of an auto associative memory network contains an equivalent number of 
output target vectors as input training vectors, as depicted in Figure 3.28. 


3.2.4 Hetero Associative Memory 


Similar to the auto associative memory network, this neural network has only one layer. The 
output target vector and the input training vector, however, are different in this network. 
In order for the network to hold a collection of patterns, the weights are chosen. There 
wouldn't be any nonlinear or delay operations feasible since hetero associative networks 
are static by nature. 


3.2.4.1 Architecture 


There are n input training vectors in a hetero associative memory network, and m output 
target vectors, as depicted in the following image in Figure 3.29. 


FIGURE 3.28 
General diagram of auto associative memory. 
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FIGURE 3.29 
Basic diagram of hetero associative memory. 


3.3 Bidirectional Associative Memory 


Bidirectional associative memory (BAM) is an artificial neural network supervised 
learning paradigm. With hetero-associative memory, it can theoretically return a pattern 
of a different size given an input pattern. The human brain and this phenomenon are 
quite comparable. Association is a necessary part of human memory. It uses a series of 
mental associations, such as those between faces and names or between exam questions 
and answers, to help recover lost memories [45, 46]. A recurrent neural network (RNN) is 
required in such memory associations for one type of item with another in order to take 
an input pattern from one group of neurons and produce a similar but different output 
pattern from another set of neurons. Introducing such a network model has as its major 
goal the storage of hetero-associative pattern pairings. This is used to recover a pattern 
from an imperfect or noisy pattern. 


BAM architecture: 

BAM recalls m-dimensional vector Y from set B when given an input of an n-dimensional 
vector X from set A. The BAM also recalls X when Y is handled as input. The simplified 
bidirectional associative memory network is shown in Figure 3.30. 


Layer A Layer B 


FIGURE 3.30 
Basic diagram of bidirectional associative memory. 
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Limitations of BAM: 
Storage capacity of the BAM: The number of associations that can be kept in the BAM 
should not be greater than the number of neurons in the lower layer. 

Erroneous convergence: BAM might not always produce the relationship that is closest. 


3.4 Self-Organizing Maps: Kohonen Maps 


A self-organizing map is an additional kind of artificial neural network (also known as a 
Kohonen map or SOM) was further inspired by 1970s-era biological models of neuronal 
networks. Its network was trained using a competitive learning algorithm and an unsuper- 
vised learning approach. It is a specific sort of ANN that draws inspiration from biological 
representations of brain systems. In order to simplify the problem and make interpretation 
simple, multidimensional data are mapped to lower dimensional data. The structure of the 
SOM with n input characteristics and two clusters for every sample is given in Figure 3.31. 


FIGURE 3.31 
Simple diagram of self-organization map. 


ALGORITHM 


Step 1: Initialize the weights w; random values may be assumed. Initialize the learning 
rate a. 
Step 2: Calculate the square of Euclidian distance, i.e,. for each j = 1 to m 
. n m 2 
D(j)- $$ (i-us) 


i=1 j=1 


Where i = is number of inputs, j = number of cluster in input section, and w; is a weight 
of each cluster 

Step 3: Find winning unit index j, so that D (j) is minimum 

Step 4: For all units j within a specific neighborhood j and for all i calculate new weight. 


(wi ) new = (wi ) old + a [xi — (wi ) old | 


Step 5: update learning rate a using formula 


a (t+1)=0.5a (t) 
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3.5 Learning Vector Quantization (LVQ) 


This is a particular kind of ANN that was also motivated by biological models of the neu- 
ral network. It is built on a classification method for supervised learning and developed 
its network using a competed reinforcement learning like a self-organizing map. It can 
resolve the multiclass issue [47, 48]. There are two layers in LVQ: an input layer and an 
output layer, as shown in Figure 3.32. 


FIGURE 3.32 
Simple diagram of learning vector quantization. 


ALGORITHMS 


Step 1: Create a reference vector by starting with a collection of training vectors, using 
the first “n” (the number of clusters) training vectors as weight vectors while still 
preserving the training potential of the leftover training vectors. 

Step 2: Calculate the Euclidean distance for i = 1 to n and j = 1 to m 


n m 


DX) 


iM j 
Finding winning unit index D(j) which has minimum value. 
Step 3: Update weights or winning unit “w; using following condition 


IfT=] (w;)new - (co; old +a [X; - (w;)old] 
If TJ; (w;)new - (w;) old — a [X;- (wi) old] 


Where, T = target vector and J = winning set vector 
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3.6 Counter Propagation Network (CPN) 


In a counter propagation network there are several combinations that take place between 
the input, output, and cluster layer. A counter propagation network is constructed with the 
help of an instar and outstar model [49]. The instar-outstar model is a three layer model 
where input and output mapping was done by using the learning algorithm to produce 
output vector Y corresponding to input vector X. The CPN model has two stages; in the 
first stage input vectors are clustered, while in the second stage, to obtain the output, deter- 
mine the weight of the cluster layer [50, 51]. In CPN there are three layers, namely input 
layer, output layer, hidden layer. The input layer is also known as the Kohonen layer the 
output layer is known as the Grossberg layer. 


Classification 
3.6.1 Full Counter Propagation Network (FCPN) 


This CPN creates a look up table. In the look up table there several combinations of X: Y 
vectors are available. It works effectively with the inverse function. The simplified dia- 
gram of FCPN is shown in Figure 3.33. 


3.6.2 Forward Only Counter Propagation Network 


This is a simplified form of FCPN as shown in Figure 3.34. In this CPN, it only makes a 
cluster at Kohonen units. Initially weights are trained which are connected between the 
input layer and cluster layer, and then weights are trained between the output layer and 
cluster layer. Here the target of the network should be knowledge. 


Input neuron Output neuron 


FIGURE 3.33 
Basic architecture of full counter propagation network. 
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Input layer Kohonen layer Grossberg layer 


FIGURE 3.34 
Basic architecture of forward only counter propagation network. 


3.7 Adaptive Resonance Theory (ART) 


In 1987, Stephen Grossberg and Gail Carpenter introduced this network. It has two new 
features: adaptiveness (open to learning) and resonance (preserving prior knowledge). 
The unsupervised learning approaches are used in the fundamental ART. An approach for 
clustering is applied by ART networks. The network and algorithm are shown the input. 


Types of Adaptive Resonance Theory (ART) 
After 20 years of study, Carpenter and Grossberg created many ART architectures. The 
following categories apply to ARTS. 


* ARTI is the most basic and straightforward ART architecture. It can cluster input 
values with binary data. 


e ARI2 is an extension of ARTI that can cluster input data with continuous values. 
* Fuzzy ART is the combination of fuzzy logic and artificial intelligence. 


e ARTMAP—this supervised method of ART learning allows one ART to build on 
knowledge from a prior ART module. Predictive ART is another name for it. 


e FARTMAP—this supervised ART architecture also incorporates fuzzy logic. 


Fundamental Architecture of Adaptive Resonance Theory (ART) 

A competitive, self-organizing neural network is the adaptive resonant theory. It can be 
either the supervised or unsupervised types (ART1, ART2, ARTS, etc.), for ARTMAP the 
name of the supervised algorithms typically ends in "MAP". However, the fundamental 
ART model is unsupervised in nature and is made up of the F1 layer, also known as the 
comparison field, and the F2 layer, also known as the recognition field (which consists of 
the clustering units) resetting module (that acts as a control mechanism) as shown in 
Figure 3.35. 
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F, (a) layer F,(b) layer F, layer 


FIGURE 3.35 
Basic architecture of adaptive resonance theory (ART). 


The input layer F(a) of the comparison layer is made up of units sı, $;...5, while the 
interface layer F(b) is made up of units x, X2,....x,. The input pattern simply passes it on to 
interface layer F(b) but does not proceed to the input layer F(a). That is [s;, s2...5,] connects 
to [x;, X2,... Xn]. The interface layer F,(b) is responsible for transmitting the input pattern to 
the recognition layer F,, which compares the input pattern with the winning code vector. 
The F, Recognition layer is also known as competitive layer. 

A unique cluster is represented by each unit of the F, layer. The number of units in the 
F, layer is not fixed; however, ARTI offers the possibility of increasing the number of clus- 
ters. The F, unit may be in any of the following three states as the ART learns a pattern: (1) 
Active: This unit has a positive activation and is “on.” (2) Inactive: Activation = 0 and the 
unit is "off." But this team participates in competition. (3) This device's activation is zero, 
making it "off." Additionally, when learning with the current input pattern, it is not per- 
mitted to continue competing. 


Advantages of ART 


e Stable and not distributed with different other ways to get good results 
* Integrated and used with a large variety of inputs offered to its network 
* Competition learning is incapable of adding new clusters when essential 


Applications 


* Mobile robot control 

* Facerecognition 

* Land recover classification 
* Medical diagnosis 

* Signature verification 
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3.8 Standard Back-Propagation Architecture 


Application of this learning technique to a multilayer feed forward network with a con- 
tinuous differentiable function is shown in Figure 3.36. It also uses gradient descent with a 
differentiable function. In this method error is propagated back to the hidden layer [52-54]. 
The training of the BPN network is done in three stages: feed forward, back propagation 
of error, and weight updater. 


Architecture 

Neurons present on the hidden and output layer have biases equal to 1. During feed 
forward information flows in the forward direction. During back propagation output 
signals are sent back. Any function that rises monotonically and is differentiable might 
serve as an activation function. Error generated at the terminal end is back propagated to 
the output layer and hidden layer to adjust the weight value. 

To solve a linear separable problem, use more than one perceptron. Combining their 
output into another perceptron would produce a final indication. For a perceptron in the 
first layer the input comes from the actual inputs while the perceptron present in the sec- 
ond layer gets input from output of the first perceptron. The perceptron of the second layer 
cannot distinguish whether the actual inputs from first layer were on or off. A hard-hitting 
threshold function removes information that is needed for further training. 


Input 


Hidden layer 


output 


Back propagation 


Back propagation 


FIGURE 3.36 
Basic architecture of back-propagation neural network. 
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3.9 Boltzmann Machine Learning 


Boltzmann machine learning is a stochastic learning processes, which is the foundation 
of the prior optimization methods utilized in ANN and has a recurrent nature. Geoffrey 
Hinton and Terry Sejnowski created the Boltzmann machine in 1985. The clarity of this algo- 
rithm provided by Hinton on the Boltzmann machine is more precise [55]. “This network's 
utilization of only locally available information is a surprise characteristic. Although the 
modification improves a global metric, the weight change only has an impact on how the 
two units it links behave, according to Ackley and Hinton in 1985. Some of the important 
characteristics of the recurrent structure of Boltzmann machined are as follows [56]. 


* They are made up of stochastic neurons, each of which can exist in either one of 
two conceivable states, 1 or 0. 


* This neuronal network has both clamped and adaptable free state neurons. 


* Simulated annealing would transform a discrete Hopfield network into a 
Boltzmann machine. 


Objective of Boltzmann Machine 

Maximizing a problem's remedy is the fundamental objective of a Boltzmann system. 
Optimizing the weights and quantities in respect to that particular problem is the role of 
the Boltzmann machine. 


Architecture 

The Boltzmann machine's construction is depicted in Figure 3.37. The diagram makes it 
apparent that the array of units is two-dimensional. Weights on connections between units 
in this case are p when p > 0, and weights of self-connections are provided by b, where b > 0. 


FIGURE 3.37 
Basic architecture of Boltzmann machine learning. 
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4 


Introduction to Genetic Algorithm 


4.1 Introduction 


Every process or model designed is centered on options aimed at achieving one or more 
objectives. The study of optimization focuses on how to mathematically model this process 
and, within the parameters of these models, how to make the best decisions [1-3]. Making 
anything better is the process of optimization. In any mathematical model of a process, 
there will be a set of inputs and a set of outputs as shown in Figure 4.1. 

Finding the input values that provide the "best" output values is referred to as optimi- 
zation. In different contexts, the word "best" can imply different things, but in mathemat- 
ics it often refers to increasing or decreasing one or more objective functions by adjusting 
the input parameters [4]. The whole range of possible outcomes or values for the inputs 
makes up the search space. The best answer in this search region can be found at a certain 
point or set of points. Finding that point or those points in the search space is the goal of 
optimization. Every optimization problem consists of three components: an objective func- 
tion, constraints, and choice variables, as shown in Figure 4.2. 

The phrase "formulating an optimization problem" refers to the process of turning a 
real-world challenge into the three categories of the quantitative equations and parameters 
[5]. The objective function, which is frequently represented by the letters "f" or "z," reflects 
a single quantity that can be maximized or minimized. The terms "minimize expense," 
“maximize flow rate," “maximize output voltage," “minimize material removal rate," and 
others are examples used in different process industries. From the literature survey, it is 
seen that there is no chance of simultaneously optimizing several objectives without under- 
standing how to maximize a single objective function. It is best to understand the funda- 
mentals of optimization in a simpler setting before moving on to more complicated 
multi-objective optimization approaches [6, 7]. 

The vector x represents the decision variables, which indicate the components of the 
situation that are under control. Both variables you can directly pick and variables you can 
indirectly impact through the selection of other decision variables can be included in this. 
For instance, a variety of independent factors, such as liquid properties, pipe diameter, 
type of flow sensor (contact type or non-contact type), and temperature of the process 
plant, must depend upon one another in order to achieve the best flow rate (objective 
function). Each independent variable in formulation should have the potential to either 
directly affect the objective function or indirectly affect the objective function through 
other decision variables [8-10]. 
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Set of input > Process Set of output 


FIGURE 4.1 
Basic diagram of any process. 


Parameters of an objective function 


| 


Y Y Y 


Choice of variables | | Constraints Objective function 


FIGURE 4.2 
Components of optimization techniques. 


Any type of restriction on the values that the decision variables take is referred to as a 
constraint. The most obvious and direct constraints are those that directly restrict your 
options, such as laws requiring you to maintain equipment to a certain standard, restric- 
tions on changing the highest flow rate allowed in the process industry, and requirements 
that each independent and dependent variable have a minimum and maximum range. 
As an illustration, let's say you 60 feet of fence to work with and want to surround the 
greatest rectangular area; what size should the space be that is fenced off? 

In the first stage, mathematical notation can be done. Let L and W represent the choice 
variables, length and width, respectively. When the area is maximized, which is equal 
to LW, the objective function is maximized. The perimeter limitation can be written as 
2L + 2W < 60. Finally, independent variables have nonnegative constraints, i.e., L > 0 and 
W 2 0. All of this information is concisely represented in the following way: 


Max: LW 
Subject to 2L + 2W < 60 
L20W20 


So —  — — ——_—— i H— MÁÁÓSOECeUO.O oO LZ 


4.2 Optimization Problems 


There are many techniques applied to accomplish the objective of a mathematical model. 
The majority of the time, this denotes qualities like determining a function's maximum or 
lowest value, the shortest amount of time necessary to compute the task, the least amount 
of money spent on the goal function, the most power needed by the device to accom- 
plish the work, and so forth [11-13]. These kinds of issues can be resolved by identifying 
the proper function, followed by methods for determining the maximum or minimum 
value of the predictor variables or attribute values. Typically, a problem of this kind will 
have the following mathematical form: When a < x x b or sometimes a or b is infinite, find 
the biggest (or smallest) value of f(x). The interesting part of this research is, in the domain 
of a and b, what are smallest and largest values of the function f(x). 
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4.2.1 Steps for Solving the Optimization Problem 


1. Identification of maximized or minimized value of the objective function and the 
constraints for the present research. 


2. Labeling the input variables and output variables or objective functions. 


3. Labeling the units of each input and output variable, for example, D for the diam- 
eter of a PVC pipe in centimeters or p for the density of the liquid in grams per 
cubic centimeter. 


4. Formulating the objective functions to be maximized or minimized. 
5. Expressing objective function formulation by means of independent variables. 


6. Applying different computational algorithms to achieve the optimum objective 
function by keep in mind the constraints of the independent variables. For an 
example, determine the sides of a 100 square unit rectangle having the shortest 
perimeter. The following methods have to be implemented to obtain the charac- 
teristics of the polygon. First, this condition turns into a problem that is entirely 
mathematical and requires us to determine the lowest value of the objective func- 
tion of the perimeter of a rectangle. 


Let x denote one of the sides of the rectangle; then the value of the adjacent side is 100. 


x 
The function that we aim to reduce is hence f(x) = 2*x + 2.100 For, x » 0 and e 0 
x x 


Next we find the value of f(x); f' (x) should be zero. 


f(x)-2- 5-0 


Solving, we get x = t 10 is of interest. 

Since f'(x) is specified throughout on the range (0, œ), and neither critical values nor 
end points remain. To identify a relative maximum, minimum, or neither maximum or 
minimum value, the second-order derivative is needed. 

f"(x)= a >0 for x = 10; hence, there is a relative minimum value. 

x 

As there is only a critical value, the smallest perimeter of the rectangle is = 2 * (10 + 10) = 40 

units, and it is only possible when this polygon is a square. 


4.2.2 Point-to-Point Algorithms (P2P) 


Recently, there has been a lot of interest in this basic optimization problem, and significant 
progress has been made. After giving a brief history of earlier findings, this subsection 
highlights the contemporary heuristics algorithm that addresses the issue by looking at 
just a piece of the input graph (quite large mapping) to solve the optimization problem 
[14]. Point-to-point algorithms identify precise shortest paths. These algorithms are heu- 
ristic since they only work well with specific classes of graphs. Although they have per- 
formed well in experiments, there are no known theoretical constraints that can explain 
the results. The majority of these algorithms are driven by the desire to locate routes via 
extensive road networks. P2P algorithms are the combination of two logical algorithms, 
the traditional Dijkstra's algorithm and its bidirectional variant, which were created in the 
1950s and 1960s, respectively. 
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In the P2P algorithm, the overall process is formulated in this way: Find the shortest 
route between the source vertex, s, and the destination vertex, t, on the directed graph 
G = (V, A), which has nonnegative arc lengths [15, 16]. Exact shortest pathways are what 
we're after. Limit the amount of the precomputed data in the preprocessing step to a con- 
stant multiplied by the size of the input graph. Practical factors impose a time restriction 
on the preprocessing phase. 


4.2.3 Ax Search Algorithm 


The A* search method was initially proposed to accelerate search in large implicitly repre- 
sented game graphs. i* search algorithm is also known as goal-directed search or heuristic 
search [17-19]. The concept is to direct the search toward because one biases the search 
toward rather than seeking a ball around s. The algorithm uses a (perhaps domain-specific) 
function z, : V > R such that z;(v) gives an estimate on the distance from v to t. Define a 
(forward search) key of v,k.(v) = 4;(v) + 7,(v). Only one thing separates the A* search from 
Dijkstra's algorithm: at each step, the former chooses a labeled vertex v with the smallest 
key to scan next as opposed to the one with the smallest d(s) value. On a network with a 
length function of z; if z;, then it is clear that an A* search is identical to Dijkstra's method. 
The algorithm is accurate if it is viable and nonnegative. 


4.2.4 Simulated Annealing 


Hill climbing allows only an upward direction, but simulated annealing allows the down- 
ward steps [20]. It follows the global maxima. It checks the value of the entire neighbor- 
hood. Simulated annealing simulates metallurgical techniques, which include heating a 
material to an elevated temperature and then cooling it. As the material cools to become 
a pure crystal, impurities are frequently eliminated as a result of atoms shifting unexpect- 
edly at high temperatures. The simulated annealing optimization process replicates this, 
with the energy state correlating to the present solution [21, 22]. 

We provide a beginning temperature, which is typically set to 1, and a limiting tem- 
perature, on the order of 10-4. In order to reduce the temperature till it exceeds the required 
temperature, the current temperature is multiplied by some proportion alpha. We per- 
form the core optimization process a predetermined number of times for each unique 
temperature value. Finding an adjacent solution and accepting it with a probability of 
e^((f(c)-f(n))) is what the optimization method entails, where c stands for the present 
solution and n for the surrounding solution. The present solution is somewhat perturbed 
to find a neighboring solution. This unpredictability helps avoid the usual problem when 
optimization strategies become stuck in local minima. The method is more probable to 
evolve close to the global optimum by conceivably embracing a less ideal solution than 
the one we now have and admitting it with a probability inverse to the increase in cost. 
Figure 4.3 depicts how simulating annealing's goal function behaves in relation to the 
state space. 

The design of a neighbor function can be challenging and must be done on a case-by- 
case basis, although the principles listed subsequently can help in identifying neighbors in 
location optimization situations. Move every point 0 or 1 units, randomly distributed; 
shuffle the supplied pieces at random. Alternately, for random input sequence elements, 
input sequence permutation. Divide the input sequence into random segments, then 
permute the segments. 
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Objective function 


Global optimum 


Local optimum 


State space 


FIGURE 4.3 
Objective function versus state space of simulated annealing. 


Advantages: 


e Easy to code for complex problems. 
* Easy to give the best solution. 
* Statistically guarantees finding globally optimum solution. 


Disadvantages: 


* This algorithm runs very slowly. 


* This algorithm never conveys the information that it achieved the optimum 
solutions. 


4.2.5 Genetic Algorithm (GA) 


Algorithms for optimization that draw inspiration from biology are known as genetic algo- 
rithms. Evolution is a theory put out by Charles Darwin that explains how species grow 
biologically via mating preference and the survival of the fittest [23-25]. Deoxyribonucleic 
acid (DNA) is a representation created through evolution. Evolutionary processes are built 
on the DNA, which encodes organisms. The uninterrupted cycle of artificial development 
based on the ideas of natural evolution is shown in Figure 4.4. Starting with random or 
purposefully initiated solutions, the evolutionary process begins. The crossover operator 
is used to combine two or more solutions once again to begin the evolutionary cycle, which 


Initialization/ Crossover 


Termination 


"EE 


Selection 4 Mutation 
FIGURE 4.4 


Genetic algorithm complete cycle. 


Fitness 
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ultimately results in a mutant result. For the subsequent generation, the finest results from 
this generation are chosen. The adaptive cycle then checks to see if the stopping criteria has 
been satisfied and, if not, repeats the genetic optimization run. However, the most basic 
form of genetic algorithms only uses one parent who is altered to produce a kid. The parent 
or child is chosen by the selection operator as the superior option. Recombination is not 
used because each generation has a single parent. Crossover and mutation operators may 
be created for practically all resolution formulations. Figure 4.4 represents the complete 
cycle of the genetic algorithm. 


4.2.5.1 Motivation of GA 


Genetic algorithms are appealing for use in addressing optimization issues because they 
can yield faster and better results than other algorithms. The following list outlines the 
requirements for Gas. 


1. Solving Complex Problems 
During a large set of problems, numerical programming on even a powerful 
computing system takes a longer time to execute the problem; by finding the 
solution in such cases, GA proves its efficiency for proving the optimal output 
within a short period of time. 

2. Failure of Gradient Based Methods 
For traditional algorithms like the hill-climbing method, simulated annealing, and 
others, the process starts by considering an initial point and moving toward the 
direction of a top of the hill. All these algorithms are suitable for the single objec- 
tive cost function, but in most complex problems there are a number of peaks and 
many valleys, which makes this algorithm produce an unstable optimum value as 
shown in Figure 4.5. 


Objective function value 


Global optimum 


Start here randomly 


Local optima 


Parameter value 


FIGURE 4.5 
Objective function of a gradient-based searching method. 
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3. Getting a Good Solution Fast 
There are some real complex problems like the travelling salesperson problem 
(TSP), optimum power flow, and very large scale integrated circuit VLSI design 
available in research fields as common examples. To solve such real problems 
using a computational algorithm, taking longer time is never acceptable; therefore 
a good enough fast solution is required. 


4.2.5.2 Basic Terminology 


Itis important to be familiar with certain basic terms that will be used throughout applica- 
tions of genetic algorithms. Figure 4.6 explain different metrics of the genetic algorithm. 


Population—Possible encoded solution for a given problem. The nature of the 
answer is comparable to the number of people. 


Chromosomes—This is a solution to the given problem. 
Gene—Elementary position of a chromosome and its value defined by allele. 


Genotype and phenotype—Genotype is the population in the computation space 
manipulated by a computing system, while phenotype indicates the populations 
of real-world problems. 


Decoding and encoding— This is a vice versa process for conversion between geno- 
type and phenotype. For a given problem, decoding is used to convert the geno- 
type to phenotype, and to convert phenotype to genotype, encoding techniques 
are used. Both the techniques are used to calculate the fitness value effectively. 

Fitness function— This is a function to produce the suitable solution for a given set 
of inputs. 

Genetic operators—This is a genetic composition to solve an objective function. 
It includes crossover, mutation, selection, and so on. In next section, each of the 
genetic operators is explained. 


Basic terminology of genetic algorithm 


i i 


i 


| 


Population Chromosome 


[Candidate solution] [Solution for a 
given problem] 


Gengtype and 
phenotypes 


[Computation process 


space] 


Genetic operator [Composition of 
offspring] 


Fitness function 


[Function which produces 
suitable solution 
corresponding to input] 


Encoding and 
decoding 


[Conversion process 
between genotype 
and phenotype] 


FIGURE 4.6 
Basic metrics of genetic algorithm. 
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4.2.5.2.1 Crossover 


Crossover allows producing two or more solutions after mating of two or more genetic 
materials. Generally most of the organism has parents. Some exceptions only have one par- 
ent since they are unaware of the existence of other sexes [26, 27]. The choice of a potential 
mate partner is the first stage in nature. Many species invest a lot of energy in the selec- 
tion of a partner and try to apply different techniques to attract the partner in a process. 
The second stage, after proper selection of mate is paring. From a biological standpoint, 
two individuals of the same species mix their genetic properties and produce offspring. 
The technique used by crossover operators in genetic algorithms combines the genetic 
properties of the parents. N-point crossover is a popular one for bit string representation. 
It alternately separates two solutions at positions n and reassembles them into a new one 
as shown in Figure 4. 

Solution of the two operators is outperformed after their best properties are combined 
by the mating. The new generated operator can easily extend several solutions after reas- 
sembling between them repeatedly. In the case of arithmetic crossover, it computes the 
average of all the possible parental solutions. For example, in the case of two parental 
component mating parameters (2, 5, 3) and (4, 3, 5), the offspring outcomes will be (3, 4, 4). 
This crossover operator can be extended to more than two parents. However, for the poten- 
tial solution of GA, the crossover rate is considered at 0.5 as a fixed value. 


4.2.5.2.2 Fitness 


During fitness computation, the phenotype is used to evaluate the fitness function. The fit- 
ness function measures the quality of the solutions produced by GA [28]. In the optimiza- 
tion technique, the construction of the fitness function is one of the important parts of the 
mathematical modeling process. The fitness function's design decisions may be influenced 
by the practitioner, who can then direct the suitable searching techniques. For any instance, 
the fitness of impractical solutions may degrade, as in the case of penalty functions. The 
fitness function values of each individual objective can be combined and the weighted 
sum calculated when multiple objective functions need to be maximized. This method and 
additional techniques are used to manage multiple objective functions simultaneously. 


First parent: 


0 0 1 0 1 1 0 0 1 0 


Second parent: 


1 1 1 1 0 1 0 1 1 1 


4 point crossover takes place and produces two offspring candidate solutions 
First offspring: 
0 0 1 0 0 1 0 1 1 1 


Second offspring: 
1 1 1 1 1 1 0 0 1 0 


FIGURE 4.7 
Example of crossover of the two parents. 
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Solution quality should be crucial for evaluation of an objective function. Though it may 
seem obvious, extra research is frequently necessary. A poorer solution should logically 
employ a worse fitness function value. Should a constrained solution that is very near to 
the global optimum be less suited than a mediocre solution that is workable? Should a 
solution in a multi-objective optimization problem perform worse in terms of fitness func- 
tion value than one that is less near to the first optimum but considerably closer to the 
second objective's optimum, which is much more important? Selecting the optimal weights 
for multi-objective optimization is the correct response to all of these queries. 

The main approaches are to strive to reduce the number of call of fitness function. The 
effectiveness of a genetic algorithm for solving problems is typically assessed in terms of 
how many fitness function evaluations are necessary before the optimal solution is discov- 
ered with the desired precision. Keeping the number of fitness function calls to a minimum 
is crucial because they might take a long time and be expensive. An excellent illustration of 
a rather lengthy fitness evaluation is the machine learning pipeline that was developed 
using a genetic algorithm. Prior to each evaluation, the machine learning pipeline must be 
trained on the provided dataset. To avoid over fitting, cross-validation training must be 
performed repeatedly, which lengthens the process. It is necessary to evaluate the predic- 
tion model's accuracy on a test set in order to acquire a precision score that can be used as 
the fitness function value. 


4.2.5.2.3 Mutation 


A solution is altered through mutation operators by upsetting them. The basis of mutation 
is random alterations [29]. The mutation rate refers to the intensity of this disturbance. The 
mutation rate is sometimes referred to as the step size in continuous solution spaces. There 
are three main criteria for mutation operators. Reachability is the first prerequisite. Every 
point in the solution space must be accessible from any other point there. Every area of the 
solution space must have a reasonable possibility of being reached. If not, there is a low 
likelihood that the best solution will be discovered. Not all mutation operators can ensure 
this property; decoder techniques, for instance, struggle to cover the entire solution space. 
Lack of bias is the second need for mutation operators. Without plateaus in unconstrained 
solution spaces, the mutation operator shouldn't cause a search to veer in a specific direc- 
tion. Bias may be helpful in cases of constrained solution. Additionally, the notion of nov- 
elty search, which seeks to search in unknown regions of the solution space, introduces a 
bias against the mutation operator. 

Scalability serves as the mutation operators' third design tenet. Each operator should 
provide for as much flexibility as is necessary to alter its strength. This is often possible 
with the majority of mutation operators that are dependent on a probability distribution. 
The standard deviation can scale the samples collected at random from throughout this 
solution space, for example in the case of the Gaussian mutation, which is based on the 
Gaussian distribution. Based on the chosen representation, the mutation operators are 
implemented. Bit-flip mutation for bit strings is frequently used. Bit-flip mutations change 
a zero to a one bit and vice versa with a predetermined frequency; this probability is what 
determines the mutation rate. It is typically selected based on how long the representation 
is. Every bit is reversed using the mutation rate 1/N for N bit string. 


4.2.5.2.4 Selection 


The best offspring solutions must be chosen to be parents in the new parental population 
in order to enable convergence toward optimal solutions. The best offspring solutions are 
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chosen from a surplus that is produced in order to move closer to the ideal. The popula- 
tion's fitness values are the foundation of this selection procedure [30, BI. Low fitness 
values are preferable in minimization tasks, and vice versa in maximization tasks. With 
negation, a minimization task can quickly become a maximization task. Elitist selection 
operators choose the finest offspring solutions as parents. According to the selection pro- 
cess, GA researchers have used two different selection parameters, as follows: comma selec- 
tion selects the y best solutions from 4 offspring; and the plus selection process selects the y 
best solutions from 4 offspring and y parents. Except of selection parameters another three 
different selection algorithms used in research whose operation based on randomness. 

The roulette wheel, sometimes referred to as fitness proportionate selection, makes uni- 
formly distributed random selections of parental solutions. A solution's fitness determines 
its likelihood of selection. This fitness percentage can be thought of as the likelihood that a 
particular solution will be chosen. The benefit of fitness-proportional selection operators is 
that any solution has a chance of being chosen in the positive probability. 

Another well-known selection method is tournament selection, which involves ran- 
domly choosing a subset of solutions from which the best ones are ultimately chosen to 
become new parents. Even if a solution has lower fitness values than other solutions, tour- 
nament selection offers a chance for it to prevail. Survival selection is the process of employ- 
ing selection as a method to determine the parents of the next generation. Which solutions 
survive and which ones perish are determined by the selection operator. This viewpoint 
embodies Darwin's maxim of the fittest winning out in nature. 


4.2.5.2.5 Termination 


When the main evolutionary loop ends is determined by the termination condition. The 
genetic algorithm frequently runs for a specified number of generations. This is plausible 
in a variety of experimental contexts. The length of the optimization process might be 
limited by the time and expense of fitness function assessments. Convergence of the opti- 
mization process is another helpful termination condition. The progress of fitness function 
gains may drastically slow down while approaching the optimum. The evolutionary pro- 
cess comes to an end if no discernible process is seen. In problems involving continuous 
optimization, there are two alternative optima conditions. 

Missing the global optimum is indicated when local optima became trapped in a condi- 
tion of stagnation. In such cases the approach should be run the program with a different 
number of generations. In the second scenario, it is improbable that a better local optimum 
would be discovered if the genetic algorithm consistently accesses the same region of the 
optimal solutions while starting from various regions. This raises the prospect that the 
local optimum will be a powerful draw. Alternatively, the local optimum may be the over- 
all one. 


4.2.5.3 Experiments 


Since the commencement of the study of genetic algorithms, experimentation has been the 
primary analytical instrument. Consequently, well-designed tests are crucial. The creation 
of a research question comes before the experimental analysis as the first task. Because 
the results of genetic algorithm studies are random, each run will provide a unique set 
of results. So the researcher needs to selects the best results. There may be one run that 
produces a poor result that didn't reach the optimum position, thus disturbing the aver- 
age statistical output. To produce the perfect statistical output, any stochastic algorithms 
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should to be run 25 times, where 50 or 100 is more recommendable. 15, 10, or even 5 runs 
may be essential as a concession in the most severe situation of phenomenally costly opti- 
mization runs. 


4.2.5.4 Parameter Tuning Technique in Genetic Algorithm 


The selection of proper parameters has a substantial impact on the effectiveness of genetic 
algorithm optimization procedures. The issue of how to choose the best parameter options 
is the biggest challenge for the researcher. Additionally, certain parameter setting and tun- 
ing activities end up being dynamic optimization problems since the best option changes as 
the optimization process progresses. There are taxonomies that distinguish between exog- 
enous and endogenous variables. Exogenous parameters are general genetic algorithm 
parameters that define universal characteristics like population sizes and selection pres- 
sure, and chromosome-related properties are defined by endogenous parameters [31, 32]. 

Before GA is run, there needs to be control and tuning of the parameters. Control tech- 
niques are made to help algorithms locate the right parameters while they run. The param- 
eters are controlled using dynamic control techniques based on static systems like the 
number of generations. Rechenberg's mutation rate control is one example of an adaptive 
parameter control strategy that makes use of feedback from the search. The automatic 
regulation of parameters based on a secondary genetic optimization process is known as 
self-adaptation. The majority of parameter tweaking and control techniques have wide 
application. They just require a few modest adjustments to be applied to the majority of 
genetic algorithms. 


4.2.5.5 Strategy Parameters 


Numerous variation operators and selection strategies have been covered in previous sec- 
tions. These choices come with two disadvantages. With the supposition that you have 
sufficient information of the solution landscape, they provide you the chance to acquire 
better solutions [33-35]. Darwin's natural selection concept is incorporated into a straight- 
forward optimization and learning system to achieve the reliable and ubiquitous solution 
of any complex problem. But in the real world, there are no free meals. We now address the 
two problems of finding the optimal solution to the original problem and of determining 
the optimal operators and their optimal parameters, denoted by the strategy parameter. 
Additionally, there are two categories of elements that might potentially impact the opti- 
mization results: global factors and local factors, which are illustrated in Figure 4.8. 


Elements affecting the optimization 


| 
i | 


Local factors: Parameters Global factors: Parameters 
effects individual affect population level 
e.g., Crossover and mutation e.g. selection type, population 


size, stopping criterion 


FIGURE 4.8 
Types of factors in optimization techniques. 
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TABLE 4.1 


Survey on Parameters Setting 


Si No. References Methodology or Parameters Settings 


1 Nejati et al. [36] Suggests ideal approach of the theoretical analysis although all the strong 
assumptions are hard to satisfy for the real time problem 


2 De Jong [37] Proposes that pop size = 50, p, = 0.60, Pm = 0.001, G = 1, and the appropriate 
strategic parameters for his test functions are elitist. 


3 Grefenstette [38] To regulate the parameters of the technique in the optimization model, the author 
employed a meta-level GA. The best objective value obtains for the following 
strategy parameters: pop size = 30(80), p, = 0.95, Pm = 0.01, G = 1, and elitism. 


Parameter 
setting 


Parameter 
control 


Parameter 
tuning 


Self-adaptive 


Deterministic 
control 


Adaptive 


control control 


FIGURE 4.9 
Strategy of parameter setting. 


Table 4.1 describes how several researchers discussed the best ways to make optimiza- 
tion algorithms work better. 

According to the taxonomy [39] of strategy parameter setting, parameter control can be 
classified into three groups as shown in Figure 4.9. 


1. Deterministic control. Heuristic rules used to alter strategy parameters often solely 
depend on generation. 


2. Adaptive control. Heuristic rules that are based on feedback from the current or 
prior population are used to modify the parameters of the strategy. 


3. Self-adaptive control. Heuristic rules based on parameters are encoded into 
chromosomes. 


4.3 Constrained Optimization 


When solving constrained optimization issues in the actual world, we only looked at the 
area bounded by the upper and lower bounds of the variables [40-42]. When employing 
EAs in constraint optimization, it is imperative to address how to assess a solution that 
violates some constraints. Typically, we aim that all constraint should be satisfied in the 


Introduction to Genetic Algorithm 85 


final outcomes of our EAs. However, it is exceedingly inefficient to discard the violated 
constraint. Constrained optimization problems (COPs) can be described in following way: 


Let min p(x), x e R” 
Such that g(x) 0,412 1,2 ... m 


r(x)=0j=n+1,n+2....k 
L <x,<u,, 1=1,2,3...n 


Where L, and U, are the lower and upper bounds of variable x;, respectively, which form 
the search space S. q inequality constraints (linear or nonlinear) and k — q equality con- 
straints (linear or nonlinear) need to be satisfied. 


4.4 Multimodal Optimization 


Sometimes during running a GA for a given problem several times, the algorithm might 
provide different solutions at different times. The balance between selection pressure and 
population variety, which is a perennial topic in constructing and analyzing Gas, can be 
adjusted with the use of multimodal optimization approaches. When there are numerous 
local optimal solutions in a solution space, the term “multimodality” is used to character- 
ize the situation [43-45]. In this case, the algorithm identifies the one and only global opti- 
mal solution as quickly as feasible. The unconstrained maximum problem, whose target 
values are all positive integers, is typically used in multimodal problems. 

Max f(x) 2 0, x e R" 

Multimodal issues might signify a variety of meanings. Other circumstances can also 
cause worry about coming up with multiple solutions. We are looking for all global opti- 
mal solutions, which there are plenty of. The following is a succinct explanation of the 
several global optimum solutions and other interesting local optimal solutions found: 


* Some factors, such as dependability, manufacturing complexity, maintenance 
complexity, and so on will be problematic to adequately describe in implementa- 
tions. Finding multiple solutions with equivalent quality provides the judgment a 
number of options from which to pick, based on other foggy factors. 


* Finding an effective solution and doing a sensitivity analysis of a problem both 
benefit from having several effective alternatives which can be designed. 


* Reversing the effects of genetic drift necessitates a complex trade-off between pop- 
ulation variability and selective influence. 


* In the event that the algorithm's search skills are insufficient to guarantee the 
discovery of the global optimum solution, the ability to locate several solutions 
of equivalent high quality enhances the possibility of getting the global optimal 
solution. 


All in all, multimodal EAs must locate and keep up numerous (global/local) optimal 
solutions within multimodal domains. 
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4.5 Multiobjective Optimization 


EAs are developed to address issues that arise in the real world, such as designing and 
scheduling. There are various requirements that must be met in real-world situations. 
Because it is challenging to compare two objectives at the same time, we occasionally 
wanted to express them as limitations [46-49]. We can categorize the relationship between 
two vectors into three groups based on Pareto's concept of dominance: one is better than 
other, the opposite is true, or they are not comparable. GAs are used in multiobjective opti- 
mization to address such issues. This is an exciting and popular scientific field. Let's first 
define the issue before providing the meanings of the terminologies utilized. Think about 
the next optimization challenge: 


min { Bye Pile) aya (X A = Pm(x)} for x e R" 
such that g(x) < 0, i = 1, 2............ sss k 
n(x)20 j2441,4*2..........1 


Where x is the decision variable, p; is objective i, q, is inequality constraint i, and r; is 
equality constraint j. 


4.6 Combinatorial Optimization 


The parameter optimization, or how to determine the best values for variables to obtain 
the maximum/minimum value of the objective function, is covered in earlier chapters. 
Not all problems in the real world are like this [50, 51]. It is frequently necessary to pick a 
few elements from a collection and arrange the constraints in such a way that the objective 
function has the maximum /minimal value. These issues fall under the category of combi- 
natorial optimization. 


4.6.1 Differential Evolution 


Storn and Price first presented the stochastic, population-based optimization approach 
in 1996 to address nonlinear optimization issues [52-54]. The differential evolution (DE) 
methodology is a parallel direct search technique that makes use of NP D-dimensional 
parameter vectors with the values x; q i = 1; 2;...; NP, where G stands for each popula- 
tion generation. The initial vector population is arbitrarily chosen and should encom- 
pass literally the entire parameter space. All random decisions are picked from a uniform 
probability distribution. If a tentative resolution is known, the first population might be 
produced by adding pseudorandom variations that are allocated regularly to the nominal 
solution x,,,,, 

Figure 4.10 shows the overall flow diagram of DE where combining the weighted differ- 
ence between two population vectors with a third vector, or mutation, creates new param- 
eter vectors. The parameters of the target vector, a different preset vector, are combined 
with the characteristics of the altered vector to create the so-called trial vector. Crossover is 
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FIGURE 4.10 
Flow diagram of differential evolution. 


a term used frequently to describe parameter mixing. If the trial vector yields a lower cost 
function value than the target vector in the following generation, it will replace the target 
vector. Selection is the name of the last operation. In NP contests to occur within a single 
generation, every population vector may only serve as the target vector once. 


4.6.1.1 Suitability of DE in the Field of Optimization 


Users typically demand that a workable minimization strategy meet the following four 
criteria: 


* The ability to manage cost functions that are nondifferentiable, multimodal, and 
nonlinear. 

* A limited number of controls to direct the minimization. These factors must be 
reliable and simple to select. 

* Recurring independent trials that consistently converge to the global minimum; 
good convergence qualities. 
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D 


Modeling of ANFIS (Adaptive Fuzzy 
Inference System) System 


5.1 Introduction 


Soft computing techniques like neural networks (NN), fuzzy logic (FL), and genetic algo- 
rithms (GA) draw their inspiration from biological computational processes and nature's 
approaches to problem-solving [1-3]. NNs are simplified representations of the human 
nervous system that imitate our capacity for situational adaptation and experience-based 
learning. Chapter 3 of the book discusses three NN systems, each representing the three 
major classes of NN architectures, namely single layer feedforward, multilayer feedfor- 
ward, and recurrent network architectures. The backpropagation network is a multilayer 
feedforward network architecture with gradient descent learning. Associative memories 
are single layer feedforward or recurrent network architectures adopting Hebbian learn- 
ing [4, 5]. ART networks are recurrent network architectures with a kind of competitive 
learning termed adaptive resonance theory. Systems that use fuzzy sets' imprecision or 
ambiguity in their input and output descriptions are addressed by fuzzy logic systems. 
Fuzzy sets have no clearly defined boundaries and offer a gradual change in an element's 
membership or absence from the collection [6]. In Chapter 2 of the book, fuzzy logic's foun- 
dational ideas and applications are covered. Genetic algorithms (GAs) inspired by the pro- 
cess of biological evolution, are adaptive search and optimization algorithms. Chapter 4 
of the book discusses the basic concepts, namely the genetic inheritance operators and 
applications of GA. Each technology has successfully solved a variety of issues originating 
from various domains on its own terms and merit [7]. At the same time, as mentioned in 
Chapter 1, various attempts have been successfully made to synergize the three different 
technologies in whole or in part, to solve problems for which these technologies could not 
find solutions individually. By properly integrating them, synergy or hybridization aims 
to overcome the drawbacks of one technology's application while utilizing the advantages 
of the other. When one technology applied alone has been unable to produce an effective 
solution, the complexity of the problem has more typically called for a careful mix of the 
technologies [8-10]. 
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5.2 Hybrid Systems 


Hybrid systems are ones that combine more than one technique to address the issue. 
Hybrid systems can be classified into the following categories [11]: 


1. Sequential hybrids 
2. Auxiliary hybrids 
3. Embedded hybrids 


5.2.1 Sequential Hybrid Systems 


Sequential hybrid systems are similar to a pipeline technology [12, 13]. Thus, the output 
of one technology becomes the input of another. One of the weakest types of hybridiza- 
tion is sequential hybrid systems, which lack an integrated fusion of the technologies. For 
an example, optimal parameters of a given problem are preprocess by GA, and it further 
proceeds to the NN model for further processing. 


5.2.2 Auxiliary Hybrid Systems 


In this scenario, one technology uses the other as a "subroutine" or changes the data 
according to their requirement [14-16]. The first technology's information is processed by 
the second technology before being passed on for additional usage. Despite being superior 
to sequential hybrids, this sort of hybridization is only considered to be of an intermediate 
degree. As an illustration, consider a neurogenetic system where a NN uses a GA to opti- 
mize the parameters that define its structural architecture performance. 


5.2.3 Embedded Hybrid Systems 


The involved technologies in embedded hybrid systems are integrated to the point that 
they appear to be entangled [17-19]. It would seem that no technology can be employed to 
solve the problem without the others because the fusion is so complete. The hybridization 
is complete in this case. An NN, for instance, may be part of a NN-FL hybrid system that 
processes fuzzy inputs and also extracts fuzzy outputs. 


5.3 Neuro-Fuzzy Hybrids 


One of the hybrid system types that have received the most attention, this one has pro- 
duced an incredible number of papers and research findings [20]. Fuzzy logic and neu- 
ral networks are two different approaches to dealing with uncertainty. Each of them has 
strengths and weaknesses of its own. Neural networks are well suited for classifying phe- 
nomena into specified groups because they can model complex nonlinear interactions [21]. 
On the other hand, output precision is frequently constrained and only permits minimiza- 
tion of least squares errors, not zero error. Additionally, a NN's training time need can be 
rather high [24]. Additionally, the training data must be carefully picked to span the whole 
range over which the various variables are anticipated to change [22, 23]. 

Two perspectives can be taken on this hybridization. One is to add fuzzy functionality 
to NNs to increase expressiveness and flexibility of a network in uncertain circumstances. 
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The second component involves giving fuzzy systems access to neural learning capabili- 
ties so they may become more environment-adaptive. This method is sometimes referred 
to as "NN driven fuzzy reasoning" in the literature [25, 26]. 


5.3.1 Adaptive Neuro-Fuzzy Interference System (ANFIS) 


ANFIS combines the ANN and fuzzy logic soft computing techniques [27]. The qualitative 
ideas of human knowledge and insights into the methodology of exact quantitative analy- 
sis can be altered by fuzzy logic. However, it lacks a clearly defined mechanism for con- 
verting human cognition into a rule-based FIS, and it also updates the MFs [28]. Compared 
to ANN, it has a higher capacity for learning to adapt to its surroundings. As a result, while 
determining rules, the ANN can be utilized to automatically adjust the MFs and reduce the 
rate of error [29]. 


5.3.1.1 Fuzzy Inference System (FIS) 


A FIS was created using three main parts: basic rules, which are used to choose fuzzy logic 
“if-then” rules based on fuzzy set membership, and fuzzy inference techniques (FIS) to 
reason from basic rules to produce the output. The FIS's intricate construction is depicted 
in Figure 5.1. When the input containing the real value is fuzzifying by its membership 
function, the input will function as the fuzzy value system (FIS), which is range between 
0 and 1 [30]. The knowledge base refers to the fundamental laws and databases, both of 
which are crucial components in decision-making. A fuzzy database typically contains 
definitions, such as details on fuzzy set parameters with defined functions. In order to 
create a database, a universe must normally be defined, the number of linguistic values 
to be utilized for each linguistic variable must be determined, and a membership function 
must be established. It has fuzzy logic operators and an "if-then" conditional statement 
that is based on the rules. The fundamental rules can be created manually or automatically, 
with the searching rules employing numerical input-output data. FIS comes in a variety 
of forms, including Mamdani, Takagi-Sugeno, and Tsukamoto [31, 32]. It was discovered 
that the ANFIS method was applied frequently using the FIS of the Takagi-Sugeno model. 


5.3.1.2 Adaptive Network 


One of the best examples of a multilayer feedforward neural network is the adaptive 
network shown in Figure 5.2. These networks frequently employ supervised learning 
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FIGURE 5.1 
Fuzzy inference systems. 
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FIGURE 5.2 
Adaptive network. 


algorithms during the learning phase. Additionally, in this architecture a number of adap- 
tive nodes are connected directly to one another without the need of weight values [33]. 
The output of each node depends upon the incoming signals and parameters, and each 
node performs a different set of tasks and has varied roles. An applied learning rule may 
have an impact on the node's parameters and lessen the likelihood of errors at the adap- 
tive network's output [34]. In adaptive networks nowadays, gradient descent and back- 
propagation are still utilized as a learning technique. Still, the back-propagation method 
has been proven to have flaws, which further limit the ability and precision of adaptive 
networks to make judgments. Major issues with the back-propagation algorithm include 
its slow convergence rate and propensity to remain stuck in local minima. As a result, [35] 
suggested a hybrid learning algorithm as an alternate learning method. This method has a 
greater capacity to accelerate convergence and prevent becoming locked in local minima. 


5.4 ANFIS Architecture 


The ANFIS strategy is utilized to manage nondirect and complex problems. In ANFIS 
hybrid intelligent systems, a straightforward informational index produces the desired 
output of the fuzzy logic controller through an interconnected NN handling components 
by means of weighted data associations [36, 37]. ANFIS consolidates the quality of the two 
intelligent strategies FLC and NN into a solitary strategy. ANFIS model parameters of a FIS 
are tuned by the NN learning strategies. A five-layer ANFIS structure appears in Figure 5.3. 


* Itamends fuzzy if-then principles to depict the conduct of a nonstraight and com- 
plex framework. 


e It doesn't require prior human skill. 
* Itis simple to actualize. 
* It empowers quick and precise thinking and learning quality. 
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FIGURE 5.3 
ANFIS architecture for five layers. 


FIGURE 5.4 
Firing of rules for different membership functions. 


* Because of legitimate choice of a reasonable decision of membership function, 
strong speculation, and brilliant clarification of fuzzy guidelines, it offers the 
desired output. 


* |t is simple to coordinate the both etymological and numeric information for 
critical thinking. 


Figure 5.3 demonstrates the ANFIS engineering executed by the two principles where 
fixed node and versatile node are spoken to by circle and square. In Figure 5.4 represent 
first order Sugeno model of ANFIS architecture. In Layer 1, every node is versatile and 
outputs of Layer 1 are the fuzzy membership of input. In Layer 2, the nodes are fixed and 
the fuzzy administrators fuzzify the inputs by utilizing AND operator. Symbol II showing 
a basic multiplier activity. Output of layer 2 provides the standardized firing qualities. In 
Layer 3, fixed nodes named as N which normalized the firing strengths from the previous 
layer. In Layer 4, nodes are versatile and output of every node is basically standardized 
firing quality and a first order polynomial. In Layer 5, it only consist a single fixed node X 
which plays out the summation of every single approaching sign. 
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5.4.1 Hybrid Learning Algorithm 


In this algorithm, the ANFIS model is a blend of least squares strategies and gradient 
descent strategies [38]. In the forward pass learning calculation, least square techniques 
are utilized to determine node outputs until Layer 4 and the consequent parameters. In the 
backward path, gradient descendent method sends the error signals to the previous stage 
and updated parameters. The hybrid learning approach is quicker than the original back- 
propagation technique because of diminishing search space dimensions. The ideal estima- 
tion of these parameters is controlled by the least squares technique. At the point when 
these parameters are variable, the search space in a hybrid learning process increases; sub- 
sequently, the convergence rate of the training datasets ought to be slower [39, 40]. To 
tackle the issue of search space, the hybrid algorithm of ANFIS joins two techniques: (1) 
least squares strategy and (2) gradient descent technique. Where the least squares strategy 
is used to advance the subsequent parameters and the gradient descent technique is uti- 
lized to streamline the reason parameters. From the review it has been seen that the hybrid 
algorithm gives the high level of proficiency in preparing the ANFIS frameworks. A differ- 
ent hybrid ANFIS learning process is shown in Figure 5.5. 


5.4.2 Derivation of Fuzzy Model 


As previously mentioned, the identification of an ideal fuzzy model with regard to the 
training data simplifies to a linear least-squares estimate problem in the ANFIS model for 
a given set of rules [41]. This suggests a quick and reliable method for identifying fuzzy 
models from input-output data. When creating a fuzzy model from data, this method 
chooses the crucial input variables by fusing the cluster estimation method (CEM) and the 
least squares estimation algorithm (LSM). There are two steps to the method [42]: (1) In the 
first stage, a fuzzy model is implemented from a given set of input and output data using 
a cluster estimation method. (2) The following phase involves testing the relevance of each 
variable in the first fuzzy model in order to determine the key input variables. 


5.4.2.1 Extracting the Initial Fuzzy Model 


An initial fuzzy model must be derived in order to begin the modeling process. This model 
is necessary to determine the final fuzzy model's number of inputs, linguistic variables, 
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FIGURE 5.5 
Hybrid learning process. 
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and consequently, rules. Before the final optimal model can be developed, the initial model 
must choose the model selection criteria as well as the input variables for the final model. 
Depending on the provided fuzzy rules, either the subtractive clustering technique or the 
grid partitioning method [43-45] can be used to select the initial fuzzy model. 


5.4.2.2 Subtractive Clustering Technique 


This method is used on the input-output data pairs collected from the system that has to 
be modeled as the first stage in obtaining the initial fuzzy model by subtractive clustering 
[46]. The input-output data pairs' cluster centers can be found using the cluster estima- 
tion technique. As a result, it is easier to identify the scattered rules in input-output space 
because each cluster center indicates the existence of a rule. Additionally, it aids in deter- 
mining the parameters' values for the underlying assumption. This is significant because, 
throughout the neural network training session, the model will eventually quickly con- 
verge to its final value if the beginning value is close to the final value [47]. In this cluster- 
ing method, the potentials of each input and output data point are determined using their 
Euclidian distances from one another. Cluster centers are points that have potential over a 
particular threshold value. The initial fuzzy model can then be recovered after the cluster 
centers have been determined because the centers will also indicate how many linguistic 
variables there are (Figure 5.6). 

Let's have a look at a set of n data points in an M-dimensional space (xi, %2,%3...Xn}. In 
order for the data points to be circumscribed by a unit hypercube, it is assumed that they 
have been standardized in each dimension [48]. Every single piece of data is regarded as a 
possible cluster center. P; is a potential of data point x; that can be presented as 


p = utm | (5.1) 
mn 


4 
Where a = 5, 
E 


Xi-X j| is a Euclidean distance, and 7, is a positive constant. If x; is the location and Pf is 
its potential value of the first cluster center, then the revised potential formula for each data 
point x; is presented by 


"m 
I 


BSECBG (5.2) 


Where f = = and 
Ty 


1 is a positive constant. As a result, each data point has potential deducted from it based 
on how far it is from the cluster center. Depending on how far each data point is from the 
second cluster center, their potential is further diminished. In general, after the kth cluster 
center has been obtained, the potential of each data point is revised by the formula 


if 
Xi—Xk 


P=B-Pe” (5.3) 


Where x; is the location of the kth cluster center and Pj is the potential value. 
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FIGURE 5.6 


Flowchart of ANFIS model training process [36]. 
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The process is acquiring a new cluster center and revising potentials repeats until the 
stopping criterion P; < 0.15 Př [49] is satisfied. It is considered that (xi, x5, ..... x; } is a set of 
c cluster centers in an M dimensional space. Cluster arranged in such a way that if first N 
dimensions present input variables, then the last M—N dimensions correspond to output 
variables. Each vector x; is divided into two components y; and z;, where y; is the location 
of the center of the input cluster and z; is the location of the center of the output cluster. 
Therefore x; may be represented as x; = [y;; z;]. The degree of fulfillment of the rule can be 
defined as 


2 


weet 64) 
Where a is a constant defined by Eq. (5.5). Output vector z is computed as 
j pizi 
z= El — (5.5) 
Hi 


5.4.2.3 Grid Partitioning Technique 


The second technique for defining the first fuzzy model's rules is grid partitioning [50-52]. 
When there are fewer inputs and membership functions, this strategy is employed. To 
build the fuzzy rules' antecedents in this situation, the input spaces are separated into a 
number of fuzzy regions. The fuzzy space for a two-input model with three membership 
functions for each input is shown in grid-partitioned form in Figure 5.7. The ordinate and 
abscissa of the input space are represented by the two dimensions. The rules obtained 
using one of the two ways is then improved using Jang's ANFIS methodology [36]. 


FIGURE 5.7 
Grid search partitioning techniques for two input variables with three membership functions. 
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5.4.2.4 C-Mean Clustering 


When a dataset is clustered, it is divided into groups, with similar datasets belonging to 
one cluster while dissimilar datasets are assigned to another [53]. In order to automati- 
cally find a flaw, fuzzy C-means (FCM) clustering was employed in this study. The FCM 
algorithm, which is an unsupervised algorithm, is one of the most used fuzzy cluster- 
ing techniques. A nonmonitoring clustering method is the FCM clustering. The FCM, an 
enhancement and modification of K-means clustering, uses a dataset of x; data points to 
create C clusters by minimizing the objective function U. The proposed method reduces 
the number of membership functions and rules by using FCM clustering [54]. The purpose 
of the FCM clustering algorithm is very similar to the k-means algorithm, and its definition 
is given in Eq. (5.7): 


y (xi = 1) (5.6) 


j=1 xieck 


Where uj is the degree observation, x; belongs to a cluster C;, and yj is the center of the 
cluster j. The variable uj is defined as Eq. (5.7): 


Hi zs (5.7) 


The parameter m determines the degree of cluster fuzziness and is a real value greater 
than 1 (1 < m < œ). The value of m near to 1 results in a cluster solution that is closer to the 
solution of hard clustering, such k-means, while a value of m close to infinite results in total 
fuzziness. The centroid of a cluster in fuzzy clustering is the average of all points, weighted 
according to how much they belong to the cluster shown in Eq. (5.8): 


c= Baa (5.8) 
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6 


Machine Learning Techniques for Cognitive 
Modeling 


6.1 Introduction 


Human individuals are born with the ability to learn. As a result, people gain the ability to 
perform better while carrying out similar tasks [1-10]. An outline of the learning concept 
that may be applied to machines to enhance their performance is given in this chapter. 
Three main categories can be used to group machine learning: supervised learning, unsu- 
pervised learning, and reinforcement learning [2, 11-14]. A trainer is necessary for super- 
vised learning, and they provide the input-output training examples. In order to produce 
the appropriate output patterns from a given input pattern, the learning system adjusts 
its parameters using a few methods [1-4]. Without instructors, the learner must adjust 
the parameters on their own because the expected outcome for a given input instance 
is unknown. Unsupervised learning is the name given to this sort of learning. Between 
supervised and unsupervised categories, there is a third category called reinforcement 
learning. The learner in reinforcement learning receives feedback from its surroundings 
even though it is not explicitly aware of the input-output instances [5, 6]. 


rn o ooo oo QPpo 


6.2 Classification of Machine Learning 


The learner can determine whether its actions on the environment are rewarding or puni- 
tive with the use of the feedback signals. Thus, based on the states of its activities, the 
learner adjusts its settings. The most popular supervised learning strategies are inductive 
and analogical learning. Decision tree and version space—based learning are both parts of 
the inductive learning technique that is discussed in this chapter. Illustrational examples 
are used to briefly introduce analogous learning. Here, a clustering issue is used to explain 
the idea of unsupervised learning. Temporal difference learning (TD) and Q-learning are 
included in the section on reinforcement learning. Fourth category learning named as 
Inductive logic programming (ILP) has recently been identified in the fields of knowledge 
engineering. Figure 6.1 shows the basic classification of machine learning techniques. 

In this chapter, the fundamentals of inductive logic programming have also been briefly 
covered. A brief explanation of the computational theory of learning concludes the chapter. 
With this theory as a foundation, it is possible to gauge how well a computer learns from 
training examples by counting them. 
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FIGURE 6.1 
Classification of machine learning. 


6.2.1 Supervised Learning 


As was already said, in supervised learning, a trainer provides input and output exam- 
ples, and the student is responsible for independently adjusting the system's parameters 
so that it would produce the right output pattern when stimulated by one of the provided 
input patterns [15-17]. In this part, we'll discuss two key supervised learning styles: induc- 
tive learning and analogous learning [18-21]. In the following chapter, a number of other 
supervised learning methods utilizing neural nets will be discussed. Figure 6.2 shows the 
different types of supervised learning methods. 


6.2.1.1 Inductive Learning 


When looking at inductive learning, the challenge is to predict the function (f) for given 
input samples (x) and output samples (f(x)). The challenge is to extrapolate from the 
samples and mapping in a way that will be beneficial for estimating the output for fresh 
samples in the future [22, 23]. A hypothesis can be developed h (xj) » f (x) for a given a 
set of x, and f(x) pairs using the supervised learning techniques known as inductive 
learning. 

According to the nature of the (x;, f (x;)) dataset, we may employ the neural learning 
techniques. The learning algorithm for such numerical sets {x;, f (x;)} must be able to 
adapt the parameters of the learner much effective than the curve fitting techniques. The 
amount of adaptations will increase with the number of training instances. Both learning 
by decision tree (DT) and learning by version space are significant subtypes of inductive 
learning. 


Supervised Learning | 


y 


Inductive learning Version space 


Decision tree Analogical learning 


FIGURE 6.2 
Types of supervised learning algorithm. 
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6.2.1.2 Learning by Version Space 


Vector space learning technique is one of the earliest types of inductive learning (IL) 
proposed by Mitchell in the early 1980s [24, 25]. All the helpful information can be 
recall by hierarchical representation of knowledge of a version space. The version space 
approach involves managing many models within a version space to facilitate concept 
learning [26, 27]. 


Characteristics 


* Version spaces are used to convey tentative heuristics. 
* Aversion space is a representation of every possible heuristic description. 


* A description is considered credible if it holds true for all known positive exam- 
ples while holding true for no known negative cases. 


* Two complementing trees make up a version space description: one with nodes 
related to overly general models, the other with nodes connected to highly special- 
ized models. 


6.2.1.3 Learning by Decision Tree (DT) 


A decision tree produces a binary judgment of true or false values after receiving a set of 
attributes (or properties) of the objects as inputs [28]. As a result, decision trees typically 
represent Boolean functions. Continuous values of the output parameters are permitted in 
addition to a 0-1 range [29]. However, we assume the limitation to Boolean outputs for the 
sake of simplicity. A decision tree's nodes each represent a test of an instance's attribute, 
and each branch descending from that node represents one of the attribute's potential val- 
ues [28, 30]. We take into account a number of cases, some of which produce a true value 
for the choice, to demonstrate the contribution of a decision tree. The previous one is the 
positive example. On the other side, we refer to an instance as "a negative instance" when 
the outcome is a wrong decision [31]. Now let's look at the issue of a bird learning to fly. 
Figure 6.3 shows the diagram for whether a bird will be able to fly or not. 
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FIGURE 6.3 
Decision tree for a bird learning to fly or not. 
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6.2.1.4 Analogical Learning 


A problem can have both positive and negative examples, and in inductive learning, the 
learner must develop a notion that encompasses the majority of the positive examples 
while excluding any negative ones [32, 33]. This indicates that inductive learning requires 
a number of training experiences in order to establish an idea. Analogical learning, how- 
ever, just requires one example to be successful. For instance, one must ascertain the 
plural form of “bacillus” given the following training example [34, 35]. By substituting 
“i” for “us,” the analogical learning system learns that words ending in “us” take on the 
plural form. 
Here are the formalized main steps of analogy learning. 


1. Identifying analogy: Determine whether a new problem and an experienced prob- 
lem instance are comparable. 


2. Establishing the mapping function: The mapping is established after relevant com- 
ponents of the encountered problem are chosen. 


3. Implementation of mapping function: A specific domain can be switched to a new 
domain using the mapping function. 


4. Validation: Through trial procedures like theorem or simulation, the newly devel- 
oped solution is validated for its applicability [34]. 


5. Learning: After the validation is over, new data is encoded and saved for further 
use. 


6.2.2 Unsupervised Learning 


In a learning algorithm, when input and output problems are provided, the learner is 
required to build a mapping function that yields the right output for a specific input pat- 
tern [36, 37]. But there is no trainer involved in unsupervised learning. So, the learner needs 
to implement the methodology by experimenting on the environment. The environment 
is responsive, but it does not distinguish between activities that are rewarding and those 
that are punitive. Because the objectives or results of the training information are unclear, 
the environment is unable to assess how the learner's activities are progressing toward 
the objectives [38]. Experiments are one of the simplest ways to build concepts through 
unsupervised learning. Consider the following scenario: A toddler throws a ball at a wall; 
the ball bounces and comes back to them. The toddler learns the “principle of bouncing" 
after doing this experiment several times. Of course, this is an illustration of unsuper- 
vised learning. The majority of scientific laws were created by this algorithm [38-40]. To 
demonstrate the fundamentals of concept development through unsupervised learning, 
let's look at another example. Consider that we want to group animals according to their 
height-to-weight proportion to the speed. For instance, we measure the aforementioned 
characteristics of sample animals and plot them on a two-dimensional frame. Figure 6.4 
shows that tigers, foxes, cows, and dogs belong to various classes. Additionally, there is 
overlap between the dog and fox classifications. Now, if we are given a measured number 
for an unknown animal's speed and height/weight ratio, we can readily identify it—as 
long as it does not fall into an overlapped category [39]. Because it lacks the necessary traits 
to accurately characterize the animal, an overlapped region cannot identify the creatures. 
For instance, the speed and height/ weight ratio of foxes and dogs are comparable. Other 
characteristics, such as face shape, are also needed to distinguish between them. 
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Height/weight 


FIGURE 6.4 
Learning animals through classification. 


In many cases, it might be difficult to identify the features themselves, especially when 
the problem’s characteristics are unclear. We take one of the biological classification issues 
as an example. 

The classification of patterns is crucial in practically every field of science, not only biol- 
ogy. For instance, in psychology, pattern classification is used to categorize patients with 
various mental illnesses so that they can be treated with a single therapy [39]. Prior to 
matching a suspect's fingerprint with databases of that class that are already known, crimi- 
nologists classify fingerprints into common categories. To extract the features from the 
dataset is the first stage in pattern classification [65]. 


6.2.3 Reinforcement Learning 


In reinforcement learning, the learner adjusts its parameters based on the environment's 
feedback signal status (reward or punishment) [41, 42]. Learning automata use the sim- 
plest type of reinforcement learning. The feedback signal exhibited in Figure 6.5 shows 
the reward /punishment status has led to the development of Q-learning and temporal 
difference learning [43]. 


6.2.3.1 Learning Automata 


The most popular of the well-known reinforcement learning techniques is the learning 
automaton [44]. Two modules make up the learning mechanism of such a system: the 
environment and the learning automation. The creation of a stimulus by the environ- 
ment initiates the learning cycle. The automation reacts to the surroundings after getting 


Reinforcement learning 
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Learning automata Adaptive dynamic 


Q-learning | Temporal learning | 


FIGURE 6.5 
Types of reinforcement learning. 
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a stimulus [45]. The environment takes in the response, assesses it, and provides the auto- 
mation with a fresh stimulus. Based on the most recent answer and the current input 
(stimulus) of the automation, the learner then automatically modifies its parameters [46]. 
Numerous applications to challenges in the actual world can be made using the learning 
automata's guiding principles. 


6.2.3.2 Adaptive Dynamic Programming 


The agent is assumed to have received a response from the environment in the case of rein- 
forcement learning, but the agent can only identify its status—whether it is rewarding or 
punishing—at the terminal stage of its activity [47, 48]. We also assume that the agent starts 
off in state Sy and changes to state 5; after acting on the environment. The agent changes 
its state from Sy to S as a result of action ao, which is indicated by the symbol ap. A utility 
function can also be used to represent an agent's reward. A Ping-Pong agent's utility, for 
instance, might be one of its selling factors [49]. In reinforcement learning, the agent could 
be passive or active. A passive learner makes an effort to understand the utility by consid- 
ering it in various stages. On the other hand, a student that is an active learner can extrapo- 
late the usefulness at unknown stages from the knowledge it has acquired. 


6.2.3.3 Q-learning 


One of the off-policy strategies and most widely used reinforcement learning approaches 
is Q-learning [50]. Early Q-learning algorithms only supported a limited set of applications 
and were unsatisfactory in a number of ways. It has also been noted that this powerful 
algorithm occasionally learns unreasonably and overestimates the values of the actions 
taken, which lowers overall performance [51]. Recent developments in machine learning 
have led to the discovery and widespread use of more Q-learning variations, such as deep 
Q-learning, which blends fundamental Q-learning with deep neural networks (DNNs) [52]. 


6.2.3.3.1 Basic O-Learning 


This type of Q-learning algorithm distinguishes the acting policy from the learning policy, 
in contrast to previous that unable to distinguish between behavior and learning [51]. 


6.2.3.3.2 Deep O-Learning 


Convolution neural networks (CNNs) and fundamental Q-learning are combined to create 
deep Q-learning, which was created by Google Deep Mind. When expressing the value 
function for each state becomes challenging, OL uses a CNN to approximate the function 
[51]. In addition to the value approximation using a CNN, deep Q-learning includes two 
techniques [52]. The target Q technique is the first, and the second is an experience replay. 


6.2.3.3.3 Hierarchical Q-Learning 


In order to address the issues that develop as the state-action space of Q-learning expands, 
hierarchical Q-learning was created [53, 54]. 


6.2.3.3.4 Double Q-Learning 


To address the issue that Q-learning struggles in a stochastic environment, double 
Q-learning was created. Q-learning is biased in a stochastic setting [55] because the agent’s 
action value is overstated. 
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6.2.3.3.5 Multi-Agent 


In order to address the issue of basic Q-learning inefficiency in multi-agent systems, multi- 
agent modular Q-learning was developed [56]. According to [57] the state space for each 
agent grows exponentially in size as the number of agents increases. The quantity of mem- 
ory and states could explode. 


6.2.3.4 Temporal Difference Learning 


In order to forecast the total reward anticipated over the long term, temporal difference 
learning (TD), an unsupervised learning technique, is frequently employed in reinforce- 
ment learning (REL) [58, 59]. However, it can also be used to forecast the test dataset, unlike 
other algorithms. In essence, this method identifies the potential input parameter before 
forecasting a new dataset. Using a succession of intermediate incentives, the long-term 
value of a behavior pattern can be calculated using the TD learning method. Future predic- 
tions serve as the training signal for predictions in TD learning. This approach combines 
the Monte Carlo (MC) and dynamic programming (DP) approaches [60-62]. Temporal dif- 
ference approaches typically alter forecasts to match subsequent, more accurate predic- 
tions for the future, much before the ultimate conclusion is evident and known, in contrast 
to Monte Carlo methods, which adjust their estimates only after the final result is known. 


6.2.4 Learning by Inductive Logic Programming (ILP) 


The branch of machine learning known as inductive logic programming (ILP) uses first- 
order logic (FOL) to express hypotheses and data. ILP primarily addresses issues with 
structured data and background knowledge since FOL is expressive and declarative [63]. 
With the help of "upgrades" of current propositional machine learning systems, inductive 
logic programming addresses a wide range of machine learning issues, such as clustering, 
classification, reinforcement learning, and regression. For the purposes of knowledge rep- 
resentation and reasoning, it uses logic. Applications like web mining, natural language 
processing, and bio- and chemo-informatics have all benefited from the use of ILP systems. 
ILP's primary and most significant benefit is that it gets beyond attribute-value learning 
systems' representational constraints [64]. The use of ILP is also encouraged by the fact 
that it makes use of the declarative logic language. It is suggested that theories are inter- 
pretable and comprehensible. 


6.3 Summary 


The four main categories of machine learning algorithms are covered in this chapter, along 
with some of their subcategories. Learning automata, decision trees, inductive logic pro- 
gramming, and version space have all been used to introduce the ideas of supervised 
learning. Unsupervised machine learning concepts are explained with an appropriate 
example. Reinforcement learning, which is more recent than supervised machine learning 
and has been extensively studied by Q-learning and its subcategories as well as temporal 
difference learning, is one of the most recent parts of learning. Machine learning research 
is still being done in the domain of inductive logic programming (ILP). The theory may be 
expanded in the future to uncover information in real-world systems. 
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Parametric Optimization of n-Channel JFET 
Using Bio Inspired Optimization Techniques 


7.1 Introduction 


The most foundational and prevalent device is the field effect transistor in electronic 
circuit applications [1, 2]. In this type of transistor electric field control follows the current 
passing through the three terminal semiconductors [3]. For an n-channel junction field 
effect transistor (JFET) the nature of substrate is n-type semiconductor. Current-voltage 
(I-V) characteristics of JFET can be defined by drain characteristics (where Vps regulates 
the current flow Ip) and transfer characteristics (Ves regulates the current flow Ip). In drain 
characteristics, the current of a JFET typically depends on supply voltage on drain terminal 
(Vpp) and drain resistance (Rd), and in transfer characteristics, drain current depends upon 
the Ipss and pinch-off voltage (V») [4, 5]. In each of the cases one set of control parameters is 
kept constant during the optimization of model parameters. In order to improve, inspect, 
and assess the effectiveness of JFET-based systems, engineers must have a precise under- 
standing of the n-channel JFET transistor characteristics from experimental evidence. This 
requires creating a high precision quantitative model to represent the nonlinear I-V cor- 
relation of the FET. Modeling approach for JFETs is essentially an optimization issue that 
is nonlinear [6]. 

Over the past few years, ensemble methods—usually motivated by objective truths, 
animal behavior, or developmental concepts—have grown highly popular [7, 8]. Swarm 
intelligence, however, is the most common and effective class of metaheuristic algorithms 
(SI). The genetic algorithm (GA) [9, 10], particle swarm optimization (PSO) [11], cuckoo 
search optimization (CSO) [12], gray wolf optimization (GWO) [13], ant colony optimiza- 
tion (ACO) [14, 15], bat algorithm (BA) [16, 17], elephant swarm water search algorithm 
(ESWSA) [18, 19], firefly algorithm (FA) [20, 21], artificial bee colony optimization (ABCO) 
[22, 23], flower pollination algorithm (FPA) [24-26], and differential evolution (DE) [11, 27], 
among others, are examples of such kinds of metaheuristic. In literature, many researchers 
used several metaheuristics like BA, CSO, FA, FPA, evolutionary algorithm, GWO and 
ABCO for the parameter estimation problems of different semiconductor devices. No one 
has attempted to use an optimization algorithm to identify the parameters for n-channel 
JFET drain and transfer character traits. One newly developed, effective, and well-liked 
SI-based metaheuristic approach that draws inspiration from flower pollination is the FPA. 
FPA has so far been effectively used to solve a variety of global optimization, multimodal 
optimization, limited optimization, structural engineering, and reverse engineering prob- 
lems, among other problems. 
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In this research, three bio inspired optimization techniques—FPA, PSO, and CSO—are 
used for parametric optimization of drain and transfer characteristics of n-channel JFET. 
The remainder of the chapter is arranged as follows. The mathematical modeling of the 
parameter evaluation issue and the FPA approach are described in Section 7.2. The discus- 
sion moves on to experimental data, methods, and simulation outcomes. References are 
listed after the conclusion. 


7.2 Mathematical Description 
7.2.1 Current Equation for JFET 


The current (1) in a drain characteristic of an n-channel JFET [28] can be given as in Eq. (7.1): 


Vos = Vpp - IpRp 


V. 2 
Vos = Vo» - Ros -Ye | (7.1) 
Vp 


Current (I) in a transfer characteristic of an n-channel JFET can be given as in Eq. (7.2): 


2 
V, 
Ip = Ipss | - Ye ) (7.2) 
Where, Ip is drain current, Ipss is the saturated drain current, Vpp is supply voltage at the 
drain terminal, Vps is drain to source voltage, Vp is pinch-off voltage, and Rp is drain 
resistance. So, this drain characteristics of JFET model contains two parameters ( Vop and Rp ) 5 
and the transfer characteristics model contains two parameters (I pss, Vp) to be estimated. 


7.2.2 Flower Pollination Algorithm 


According to Abdel-Basset and Shawky [29] and Nguyen et al. [30], flower pollination 
is generally related to the transportation of pollen enabling vegetative propagation, with 
insects, birds, and bats serving as the primary pollinators for this transference. A recently 
suggested metaheuristic called FPA [31] uses certain parts as criteria for pollination. 
Pollinators that convey pollen replicate Lévy flights during transit, which is consistent 
with the assumption that biological cross-pollination is a mechanism of global pollination 
(Rule 1). Abiotic pollination and self-pollination are exploited for local pollination (Rule 2). 
Depending on how closely two blooms resemble one another, or the likelihood of success- 
ful reproduction, pollinators may acquire floral dependability (Rule 3). A switch probabil- 
ity p e [0, 1] that is slightly skewed in favor of local pollination can be used to regulate the 
transition from local to global pollination (Rule 4). Each bloom or pollen in this case repre- 
sents a remedy to an optimization issue. The two following equations represent global and 
local pollination, or search, respectively: 


xi =x} +yLevyA (gsx) (7.3) 
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xpl=xite (xj xi) (7.4) 


Where it is the ultimate idea currently identified among all possibilities at the latest 
incarnation, x; and x; are pollens from separate flowers of the same vegetation types, e 
stands for random walk step size, and y is a scaling factor stands for randomness step size 
g* within a homogenous distribution in [0,1]. x; is the pollen i or solution vector x; at cur- 
rent iteration. FPA was chosen as an optimization approach because it provides higher 
resolution and fidelity than other well-known metaheuristic methods [28, 32]. 


7.2.3 Objective Function 


To assess the quality of a solution, all optimal control techniques employ a perfor- 
mance index or fitness value. In order to reduce the discrepancy and simulated cur- 
rents, the estimate job seeks the most ideal values for the random variables. The root 
mean square of the error (RMSE) is defined as Equations (7.5) and (7.6) can be used as 
the objective function [13]. 


1 n 
RMSE(X)= la > f(Voo, Rp, X) (7.5) 
for drain characteristics 
1 n 
RMSE(X) - x > Al Ioss,V,, X) (7.6) 


for transfer characteristics 

Where N is the quantity of empirical observations, that is, a set of n-channel JFET volt- 
age and current, X is the set of the estimated parameters, that is, X= Yoo, Rp for drain 
characteristics and X = Ipss, V, for transfer characteristics. Now three bio-inspired optimi- 
zation techniques, PSO, FPA, and CSO, are employed to reduce the value of the aforemen- 
tioned function, so that the best value of X = Vpp, Rp for drain characteristics and X = 
Ipss, V, for transfer characteristics can be obtained. 


7.3 Methodology 


The two following phases make up the primary portions of the overall process of 
metaheuristic-based optimization of n-channel JFET model parameters: (1) Research in 
the lab led to the observation of a number of JFET voltages and matching drain currents in 
drain and transfer characteristics. (2) Implementation of PSO, FPA, and CSO to optimize 
the model parameters of JFET. Table 7.1 represents the IC specification of N-channel JFET. 
The details of these steps are explained in Figure 7.2. 

For the laboratory experiment, J112A n-channel JFET and circuit connection shown in 
Figure 7.1. Then we apply two variable DC power supplies across them in the input circuit 
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TABLE 7.1 
J112A N-Channel JFET Specification 


Gate-Source Gate-Source Zero Gate Voltage Drain-Source Operating 

Breakdown Cut Off Voltage Drain Current Resistance Temperature 

Voltage Vas (V) Ves-or (V) Ipss (mA) Rps (ohm) Range (*C) Package 
35 -5 5 50 —65-150 Metal TO-92 


DE Sal emen 


vt 


FIGURE 7.1 
Circuit diagram for determining the drain and transfer characteristics of n-channel JFET. 


for reverse bias Ves and the output circuit for forward bias operation Vps. Next, (1) in the 
case of drain characteristics, by keeping Ves constant, the output circuit voltage Vps is pro- 
gressively raised and a commensurate drain current is shown as a result; and (2) in transfer 
characteristics, by keeping output control voltage Vps constant, Ves is gradually increased 
and the corresponding drain is measured. 

In next phase of this work, FPA has been used for optimization of n-channel JFET param- 
eters. In our present problem, the dimension of search for the metaheuristic is 2 as the 
input variables of optimization process are X = Vpp, Rp for drain characteristics and X = 
Ipss, V, for transfer characteristics. Instruction and the computation of the genetic algo- 
rithm are done using observational evidence. RMSE is employed as the fitness function. 
The population and overall number of iterations for PSO, FPA, and CSO are set as 100 and 
1000, respectively. The lower and upper limits, that is, the search range, for supply voltage 
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Experiment on I-V characteristic of n-channel JFET for drain 


and transfer characteristics performed in laboratory 


SEL 


Implementation of three bio-inspired algorithms: PSO, FPA, and 


CSO, to optimize the model parameters (use experimental 


data as training data and RMSE as objective function) 


Get the optimal model parameters of J112A n-Channel JFET 
FIGURE 7.2 


Stepwise methodologies for the present research. 


and drain resistance are chosen as [6 V, 12 V] and [1 Kohm,10 Kohm] for drain characteristics, 
and the zero gate voltage drain current and pinch-off voltage are chosen as [10 mA, 15 mA] 
and [3 V, 5 V], respectively, for transfer characteristics. Following iteration, the best 
optimization strategy will arrive at the optimal solution, or the best set of process variables 
that minimizes RMSE. Calculated I-V characteristics should be almost identical to the 
diode's experimental I-V characteristics under ideal conditions. 


7.4 Result and Discussion 


The findings of this work have been demonstrated and discussed in this part in order to 
make some significant conclusions. Initially, a total of 15 readings of drain characteristics 
for three different sets of control voltage Ves (0 V, -1 V, and -1.5 V), voltage Vps, and current 
Ip have been observed, while in transfer characteristics a total of 17 readings were taken for 
Ves and current Ip in n-channel JFET. The experimental I-V characteristic of a p-n junction 
diode is shown in Figure 7.3 and is almost exponential in nature. Next, three bio-inspired 
algorithms, PSO, FPA, and CSO, have been applied to this experimental dataset to find 
the optimal values of supply voltage and drain resistance and the zero gate voltage drain 


18 
S 16 Exp (Id) 
= => PSO(Id) 
y 14 
i FPA(Id) 
3 a ——CSO(ld) 
2 10 
[9] 
a 8 
> 6 
E a 
S 2 
ul 7 

0 - 


0 0.2 04 0.6 0.8 1 12 14 16 18 2 22 24 26 28 3 32 
VGS (Volt) 


FIGURE 7.3 
Transfer characteristics of n-channel JFET. 


122 Artificial Intelligence for Cognitive Modeling 


TABLE 7.2 


Comparative Study Based Accuracy and Average Computational Time in 
Drain Characteristics 


Name of the Algorithm Avg. Computational Time (sec) RMSE (%) 
PSO 256.21 5.42 
FPA 129.32 3.41 
CSO 343.652 9.18 
TABLE 7.3 


Comparative Study Based Accuracy and Average Computational Time in 
Transfer Characteristics 


Name of the Algorithm Avg. Computational Time (sec) RMSE (%) 
PSO 221.41 4.32 
EPA 147.31 2.19 
CSO 289.54 7.52 


current and pinch-off voltage for transfer characteristics. All of the bio-inspired algorithms 
have been run 10 times using the configuration described previously since they provide 
various results based on startup and search method unpredictability. The final results were 
then obtained by a statistical analysis. The fitness value, or RMSE [33, 34], is shown in 
Tables 7.2 and 7.3 for each program run. It is evident that FPA is always capable of achiev- 
ing the minimum values of 0.00341 and 0.00219 for drain and transfer characteristics or 
those close by. The overall performance of FPA for this JFET optimization task can thus be 
seen as being quite excellent. 

Computational time [35] indicates the time taken by the optimization algorithm to mini- 
mum fitness value of the objective function. Table 7.2 and Table 7.3 show the computational 
time taken by the applied optimization algorithms. In both cases FPA took the least compu- 
tational time to provide the least fitness value of drain current fitness function (Figures 7.4 
and 7.5). 

The predicted value of the drain current was then determined using Equations (7.1) and 
(7.2) along with the aforementioned parameters. Figure 7.3's comparison of the observed 
I-V characteristics of n-channel JFETs with computed output as shown in Figure 7.6. 
Figure 7.6 shows that the I-V attributes of an n-channel JFET between experimental evi- 
dence and computed or simulated values change very little. This claim certifies a technique 
that is suggested for determining the JFET's ideal parameters, with the goal of minimizing 
the difference between estimated and empirical values of diode current. From the charac- 
teristics graphs it is seen that FPA generates the fittest drain current over the other algo- 
rithms while CSO performed worst. Convergence speed [36] is another feature to identify 
the best algorithm for optimization of any function. With increasing number of iterations, 
if convergence speed decreases or remains steady, this indicates the best algorithm for the 
optimization of a given problem. Figures 7.7 and 7.8 represent the convergence speed of 
three algorithms, CSO, FPA and PSO, in drain and transfer characteristics, respectively. In 
both cases FPA outperformed CSO and PSO. In this research we consider the set 
of X = Vpp,Rp for drain characteristics and X = Ipss, V, for transfer characteristics corre- 
sponding to the best fitness, that is, least RMSE, as the final output. Table 7.4 shows the 
optimal parameter values of J112A n-Channel JFET at the two above-mentioned 
conditions. 
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TABLE 7.4 


Optimal Values for Drain and Transfer Characteristics at Different Conditions 
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Transfer Characteristics 


Drain Characteristics 


Algorithm Ipss (mA) V, (volt) Vpp (volt) Rp (k) 
CSO 10.98 4.9 11.9 4.8 
FPA 12 5 12 5 
PSO 11.85 5 12 5 
14 
a 2 
`= 
33 10 - 
ză e 
33 ——Vgs-0V PSO(ld) 
ES p Vgs=0V FPA(Id) 
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FIGURE 7.4 


1 15 2 25 3 35 55 6 65 7 


4 45 5 
VDS( Volt) 


Drain characteristics when VGS is set at 0 V. 
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FIGURE 7.5 
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Drain characteristics when VGS is set at -1 V. 
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FIGURE 7.6 
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Drain characteristics when VGS is set at —1.5 V. 
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FIGURE 7.7 
Convergence speed of the proposed algorithm for drain characteristics. 
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FIGURE 7.8 
Convergence speed of the proposed algorithm for transfer characteristics. 


7.5 Conclusion 


An essential challenge in the field of electronics is the designing of FET parameters. An 
effective and precise approach is suggested in this study to determine the J112A n-Channel 
JFET's ideal parameters. PSO, FPA, and CSO, three distinct bio-inspired metaheuristic 
optimization techniques, were employed for the refinement, and the optimization process 
for the drain and transfer characteristics was the difference between the predicted and 
empirical quantity of the drain current. For the best model identification, three important 
criteria were chosen, namely RMSE of the training dataset, computational time, and con- 
vergence speed. In each case the calculated I-V characteristic is similar to the experimental 
I-V characteristic of n-channel JFET, which validates proposed methodology, but by means 
of RMSE, computational time and convergence speed favor FPA outcomes rather than PSO 
and CSO. Future parameter optimization methods for JFETs might be more effective and 
precise thanks to other cutting-edge and hybrid optimization techniques. 
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8 


AI-Based Model of Clinical and 
Epidemiological Factors for COVID-19 


8.1 Introduction 


The first case of pneumonia with unknown symptoms was found in December 2019 [1]. 
Since then, the virus epidemic has caused grave alarm, and on January 30, 2020, the World 
Health Organization (WHO) proclaimed the outbreak to be in full swing. Additionally, 
COVID-19, a novel coronavirus, was classified as a pandemic on March 11, 2020. Since there 
were 3,181,642 cases and 2,24,301 deaths impacting 215 nations and territories as of May 
1, 2020, the novel coronavirus infection (COVID-19) caused shock and alarm throughout 
the world [2, 3]. At that time no effective vaccination or antiviral treatment was invented, 
so these numbers were anticipated to drastically increase in the future. The only effective 
method for halting the virus spread was preventative measures like lockdown. Accurate 
and timely information regarding the cause and epidemiological features of this worldwide 
pandemic may prove to be a viable means of containing the virus [4-6]. Finding the causes 
and spatial dissemination of these newly developing infectious diseases of zoonotic origin 
is extremely difficult due to their high transmissibility [8]. These complicated issues can be 
resolved with the effective integration of techniques like epidemiology, AI approaches, and 
bioinformatics. The Huanan Seafood Wholesale Market in Wuhan, where the first case of 
COVID-19 was discovered, was thought to be the primary site of the disease's propagation 
from animal to human. However, subsequent cases were not discovered to be related to the 
same exposure, and hence symptomatic human-to-human transmission was determined 
to be the primary factor in the spread of COVID-19. Presymptomatic and asymptomatic 
transmission for the disease's spread were both equally probable at the same time [7, 9]. 
Although a number of demographic parameters, such as gender, age, and blood type, 
have historically been categorized as infection risk factors, biostatistics studies have not 
yet developed sufficiently to explore complex correlations between numerous variables 
[10]. With the development of applications for neural network, fuzzy logic, and evolution- 
ary algorithms, computers can now identify correlations between variables and determine 
which ones are most promising [11, 12]. 

In this research the authors mainly focus on the broad categorization of risk and preven- 
tive variables that contribute to the spread of COVID-19. To represent how the elements 
interact and are dependent on one another, different NN-model-based different function 
approaches are presented. Eight different training function based neural model are utilized 
for a given training dataset to verify which one of the training functions offered the regres- 
sion value unity and the least mean square error (MSE) value is considered as a best fitted 
neural model for the present research. The manuscript is organized as follows: After the 
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introduction in Section 8.1, Section 8.2 covers the work related to the COVID-19 domain. 
Section 8.3.3 describes the modeling of neural network approach. Section 8.4 presents the 
result analysis and discussion of the proposed model. Section 8.5 presents the conclusion. 


8.2 Related Work 


The spread of the COVID-19 epidemic virus has captivated scientists and medical practi- 
tioners. To create effective preventive measures to stop the spread of COVID-19, it is neces- 
sary to comprehend the pattern of its unprecedented spread. Its limited span presents the 
biggest obstacle to fully comprehending the pattern of its proliferation. Therefore, much 
research is being done by scientists to understand its epidemic nature, which helps them 
predict the rise of infected persons with more accuracy. The recent studies pertaining to 
the forecast of its breakout are briefly covered in this section. Table 8.1 describes previous 
surveys conducted by several researchers. 


TABLE 8.1 


Survey of Previous Work 


Si No. Reference Survey 


1 Li et al. [13] Proposed a Gaussian distribution to explain the different stages of coronavirus 
transmission. Authors simulate the model for the same purpose while taking 
into account information on the Hubei epidemic situation. The authors assert 
that there is a relatively minor discrepancy between model predictions and 
actual values. The suggested approach can therefore serve as a foundation 
for epidemic prevention and control in the afflicted countries. 


2 Okhuese [14] The author observed the susceptible-exposed-infectiouscremoved (SEIR) 
model using migration data in this study until January 23, 2021. The author 
also used the most recent COVID-19 epidemiological data to comprehend 
the epidemic curve. Additionally, SEIR incorporates AI, and this improved 
SEIR model was trained using SARS data from 2003. In addition to this, 
the effectiveness of the model also verified for the forecasting of Ebola 
virus analysis in 2018. The author warrants that this dynamic SEIR model 
accurately forecasts the peaks and magnitude of the epidemic. 


3 Al-Najjar and Authors designed a neural network based classifier model to determine how 
Al-Rousan [15] a patient reacts to a therapy. The model was implemented by taking the 
dataset from February 20, 2020, to March 9, 2020, pertaining to recovered 
patients and patients who passed away. Seven different variables—country, 
area, infection reason, confirmation date, birth year, sex, and group—are 
used in the proposed model. The proposed classifier model looks for the 
characteristics that can most accurately predict death or recovery. 


4 Cao et al. [16] The authors applied knowledge based short-term preventative strategies that 
aid in the development of various control measures to stop the virus's further 
spread. For the same, they provide a time series model for COVID-19 short- 
term forecasting and a dynamic model for the epidemic. The forecasting was 
allegedly done accurately, according to the authors. 


(Continued) 
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TABLE 8.1 (Continued) 


Si No. 


Reference 


Survey 


5 


10 


11 


12 


Hellewell et al. [17] 


Sharma et al. [18] 


Ranjan [19] 


Bannister-Tyrrell 
et al. [20] 


Lu et al. [21] 


Rosita [22] 


Dutta et al. [23] 


Dutta et al. [24] 


Authors proposed a stochastic transmission mechanism to evaluate the efficacy 
of contact tracking and isolation, which led to this conclusion. This model 
calculates the number of primary infected individuals produced and the 
number of secondary infected individuals that contribute to calculating the 
reproduction number (R0). The infection rate increases exponentially as RO 
rises over 1. The rate of infection could, however, exponentially decline if RO 
stays below 1. 


Author employed data analytics to demonstrate how RO can be managed by 
limiting interperson contact using a quarantine paradigm. This work also 
asserts that more stringent measures must be taken because the quarantine 
approach alone is insufficient to stop the spread of the disease. 


According to the author, the value RO falls within India's predicted range of 
1.4-3.9. Predictions are made using exponential and traditional susceptible— 
infected—recovered (SIR) models (both long-term and short-term). According 
to this model, equilibrium will be reached in India by May 2020. The authors 
also note that prediction is based on high degrees of social estrangement; it is 
invalid if communal transmission occurs in India. 


In this article, the author took into account how the environment affected 
COVID-19's propagation and provided a fascinating statistic that mentioned 
the impact of temperature on its proliferation in Europe. According to the 
study, greater temperatures reduce the spread of this virus. 


Author researched the connection between ventilation and epidemic. The 
scientists came to the conclusion that since droplets linger in the air longer 
in an air-conditioned environment, droplet transmission is improved. 
Additionally, the direction of air movement is a crucial consideration in this 
case. In addition to environmental influences, research tries to understand 
how physical health relates. 


For the purpose of determining the prevalence of comorbidities in COVID-19 
patients, the author presents an analysis based on a survey of several 
databases, including PubMed, EMBASE, and Web of Science. According 
to this study, severe patients are at an increased risk due to the current 
conditions compared to non-severe patients. 


Author designed a machine learning based model to predict whether a person 
is affected by COVID-19 on the basis of symptoms shown on his or her health 
status. Cold, fever, cough, body pain, and malaise were the most common 
potential symptoms for COVID. 

Author proposed four machine learning models to classify COVID-19 among 
other diseases like jaundice, malaria, covid, common cold, typhoid, dengue, 
and pneumonia based on feature selection and ranking methods. 


8.3 Artificial Neural Network Based Model 


To classify the COVID-19 data using an ANN based model, a multilayer feed forward 
artificial neural n2etwork with one hidden layer which consist of 20 nodes has been used 
[25]. On the other hand, the input layer consists of 8 nodes which are different symp- 
toms and features of COVID-19 patients. The output layer of the ANN model consists of 
only one node that indicates the predicted value (COVID 1 or 0), that is, the status of the 
patients. Initially, the training dataset is used to train the ANN model with the use of a 
back-propagation algorithm. The trained ANN model is then cross-validated using the 
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training data itself, and the performance and accuracy was noted. Cross-validation is a 
resampling procedure used to evaluate machine learning models on a limited data sample 
[26]. Next, the trained model is tested against a new dataset that contained inputs for the 
period April 1 through April 7, 2020. The performance and accuracy are also noted for test- 
ing new cases. All the results are given in the results section. 


8.3.1 Modeling of Artificial Neural Network 
8.3.1.1 Collection, Preprocessing, and Division of Data 


For setting the artificial neural structure and implementation, we initially classify the test 
datasets from the clinical and epidemiological factors for the COVID-19 control framework 
[27, 28]. As the performance of neural network totally depends upon the full range of the 
input datasets, one must be take care while arranging the experimentation itself. A multi- 
layer network is very useful for training data inside the range of inputs efficiently, but it 
doesn't have the caliber to train the data beyond this range. Input information of the neural 
system should be standardized before applying it. When the number of inputs is more 
than three, then the output of multilayer systems becomes substantially saturated as all the 
hidden layers used a tan-sigmoid activation function. In such cases, the gradient will be 
very small. System output consistently falls into a standardized range in the pre-preparing 
stage, and in the post-processing stage, system output follows the target output. To design 
a neural network model, input and target datasets are haphazardly divided into three dif- 
ferent sets considered as a 90% training set and finally 10% information tests for testing set. 


8.3.1.2 Implementation of Neural Network 


In the present research we used a dynamic and multilayer neural system for the model 
improvement of nonlinear clinical and epidemiological factors for COVID-19 strategy. In 
this proposal, we have utilized the feed-forward neural system design and derricks train- 
ing algorithm mainly because of its simplicity of establishment. To implement this model, 
we utilize the neural system toolbox of MATLAB [29]. Initially to motivate the neural sys- 
tem model, the database is classified. A large number of datasets of the input parameters 
are used as the input row while risk value is the output row of a matrix. The output of 
the framework contains target datasets which have a straight relationship between risk 
value and input variables, namely virulence, immunity, temperature, populations, and 
ventilations. 

In second steps, the NN model is found by considering the different transfer functions 
with the minimum mean square error (MSE) as shown in Table 8.2. Least MSE is better for 
the optimization. In MATLAB, the tool Tansig is available (second least MSE after Softmax), 
so in the present research we used transit as a neuron transfer function. 

In feed-forward back-propagation, we used two hidden layers because from the tool 
analysis we get least MSE but maximum regression, which is shown in Table 8.3. To check 
the correlation between the output and target data in the NN model, we used regression 
analysis [29, 30]. Regression 1 indicates the close relationship between output and target 
data, and for 0 the relationship between output and target is random. Table 8.4 describes 
the details of the neural network model train, test, and validation datasets with the num- 
ber of neurons and transfer function of each layer. Total details of input features and their 
ranges are shown in Table 8.5. 
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TABLE 8.2 
List of Transfer Function with MSE 


SI. No Transfer Function MSE 
1 Linear Tanh 2.21E-7 
2 Softmax 0.99E-7 
3 Bias 2.01E-7 
4 Linear 3.66E-7 
5 Tanh 2.62E-7 
6 Logsig 1.23E-7 
7 Linear sigmoid 2.54E-7 
8 Sigmoid 2.47E-7 
9 Axon 2.22E-7 
10 Tansig 1.01E-7 
TABLE 8.3 


Feedforward Back-Propagation with MSE and R 


No. Hidden Layers MSE R 

1 9.6501 0.61672 
2 1.63 0.99691 
3 4.13 0.66979 
4 6.838 0.98365 
5 2.537 0.99142 
6 4.936 0.98833 
TABLE 8.4 


Summary of the Optimized Neuron 


Database Training Datasets 144 
Test Datasets 18 
No. of neurons in 1st layer 10 
2nd layer 10 
Transfer function of 1st layer Tansig 
2nd layer Tansig 
Output layer Linear 
TABLE 8.5 
Input Range of ANN 


Input Virulence Immunity Temperature Population Ventilation 
Min 4 5 15 300 18 
Max 10 10 40 900 45 
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8.3.2 Performance of Training, Testing, and Validation of Network 


In a neural system, the training process includes tuning the weights, biasing values and 
mean square error of the system with the assistance of successive epochs, and training 
function and input-output datasets. In this research we have utilized a back-propagation 
based training function so that neural systems could be accomplished over the whole 
length of the input space. In the training process, training datasets are applied for figuring 
out either the slope of the MSE or the Jacobian error, regarding the consecutive epochs, 
weights, and biases and improving this factor. 

The validation datasets include MSE toward the finish of successive epochs. Like the 
training datasets MSE and gradient magnitude, the MSE and gradient magnitude of vali- 
dation datasets were initially very small during the underlying stage [31-33]. However, 
when the training datasets are overfitted, training MSE may even now decrement despite 
the fact that validation MSE starts to prosper. If the validation MSE increases corre- 
sponding to the epochs, then the training process will be terminated and the model will 
obtain the weights and biases value during the least MSE of validation. In a specific 
epoch, if the slope size is not up to the limit set by the user, then the system updates the 
weights and biases to set least MSE toward the finish of the specific epoch. Aside from 
this, least training MSE, maximum training time, or extreme number of epochs are the 
other criteria to terminate the process [34]. In restricting criteria, training datasets also 
count test datasets’ MSE in every epoch, but this is not used in a process terminating 
condition. If the test datasets accomplish a base value of MSE at an altogether unex- 
pected epoch in comparison to the validation dataset's MSE, this shows poor segmenta- 
tion of the datasets. 


8.3.3 Performance Evaluation of Training Functions 


In this segment, the presentation of various training functions utilized in feed-forward 
BP calculation, namely FG, BR, CGB, CGF, CGP, GD, GDM, GDA, GBX, LM, OSS, R, 
RP, and SCG, are assessed [35-38]. A sum of 162 exploratory examples are dealt with 
in the input-output target datasets for ANN-based clinical and epidemiological factors 
for COVID-19 model improvement. These ANN information tests have been randomly 
assigned such that the quantity of tests utilized for training and testing are 144, and 18, 
respectively. 

Figure 8.1 shows the neural network model architecture applied in the present process, 
which consists of five inputs (virulence, immunity, temperature, populations, and ventila- 
tions) and one output (risk factor) parameter. In this architecture, the number of neurons 
in the output layer and hidden layer are 1 and 10, respectively. Figure 8.2 represents the 
regression plot for training, testing, and validation datasets of the NN model. For each of 
the cases the dataset regression (R) approaches to unity, which means the model is the best 
fit for predicting the datasets [39, 40]. Figure 8.3 shows the properties of the neural network 
based present model which indicates performance function, number of layer (hidden layer, 
output layer), their transfer function, the nature of the training function, and the adapta- 
tion learning function to predict the model accurately. Figure 8.2 represents the training 
information and training parameters like number of epochs, minimum gradation, maxi- 
mum failure, and so on for the NN model. 

Variation of the regression (R) and MSE with respect to number of nodes shown in 
Figures 8.3 and 84, respectively. From the graph it is seen that R approaches to unity and 
MSE is decreased by increasing the number of nodes of the present NN model. 
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FIGURE 8.1 
Properties of neural network diagram. 
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FIGURE 8.2 
Training information and training parameters of neural network model. 
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Plotting between MSE versus number of nodes. 
TABLE 8.6 
MSE and R for Purelin Transfer Function 
Training Function No. of Iterations MSE R 
Traingdm 6 1.22 0.25134 
Traingda 46 0.21 0.8434 
Traingdx 28 0.41 0.9513 
Trainlm 6 0.19 0.9734 
Trainoss 13 0.35 0.9328 
Trainr 157 0.95 —0.2623 
Trainrp 32 0.41 0.9314 
Trainscg 28 0.28 0.9451 


Table 8.6 describes the values of (MSE and R of eight different training functions under 
the same adaptation learning function (learngdm) and transfer function (Purelin). Similarly, 
Table 8.7 also describes the values of MSE and R of eight different training functions under 
the same adaptation learning function (learngdm) and transfer function (Tansig). In each 
of the cases it is seen that the Trainlm training function is best fitted for predicting the 


model. 
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TABLE 8.7 
MSE and R for Tansig Transfer Function 
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Training Function No. of Iterations MSE R 
Traingdm 6 1.35 0.37648 
Traingda 29 0.625 0.9804 
Traingdx 28 2.37 0.06281 
Trainlm 6 0.105 0.99818 
Trainoss 13 5.57 0.5176 
Trainr 157 11.95 0 
Trainrp 23 0.903 0.9742 
Trainscg 49 0.682 0.9915 
[ELS] 


8.4 Results and Discussion 


In this section, the detailed results corresponding to two cases of AI models, namely, using 
ANN to detect if a person is affected by COVID or not, are shown and discussed. Figure 8.5 
shows the neural network model (using MATLAB) to detect if any person is COVID posi- 
tive or not based on some input symptoms and features. Table 8.8 and 8.9 present RMSE 
and accuracy of the proposed model for predicting the risk value with an accuracy of 
95.77% and 98.02%, respectively. 

After the training process is finished in the neural network model, the measurement 
framework is exposed to diverse input parameters. The output of the referenced process is 
portrayed Figures 8.5 and 8.6. Figure 8.5 presents the comparative study of actual risk 
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TABLE 8.8 
RMSE and Accuracy for Cross Validation 


Parameters RMSE (%) Accuracy (%) 


Risk value 4.23 95.77 
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TABLE 8.9 
RMSE and Accuracy for Testing Dataset 


Parameters RMSE (%) Accuracy (%) 
Risk value 1.98 98.02 
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FIGURE 8.6 
Relative error of risk value in test dataset versus number of trials. 
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FIGURE 8.7 
Plot for performance versus number of epochs. 


value and calculated risk value for the testing dataset while Figure 8.6 presents relative 
error versus the number of testing trials dataset. 

Figure 8.7 presents the performance versus of number of epochs for the present NN 
model. After proper tuning of the present model, the biasing weight of the NN layer is 
presented in Figure 8.8. 
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Select the weight or bias to view: | wil, 1} - Weight to layer Lfrom inputi ~ 
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-13.7664 8.325 -21.6583 -2.7315 -21.7751; 
-39.36 1.9726 0.90152 7.9454 -13.9186; 
25.7517 -26,1076 6.209 -28.9386 -5.7158; 
-25.9609 -21.0493 14.3667 -10.9316 22.3061; 
-9.121 10.5506 -12.0818 -29.2203 7.5286; 
17.993 10.2372 -2.7379 13,9955 -3.7221; 
0.51135 -0.14941 1.0152 -0.27535 1.4552] 
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FIGURE 8.8 
Bias weighted value for the present NN model. 


8.5 Conclusions 


For this research, five different combinations of the algorithm along with 10 different trans- 
fer functions of the neurons are utilized. Among them, a feed-forward back-propagation 
NN model with Learngdm adaption learning function and Tansig transfer function is used 
due to the maximum value of R and minimum value of MSE, giving the maximum degree 
of accuracy. The model is designed with two hidden layers. The proposed optimization 
technique still fulfilled the objective for the cross validation and testing dataset with a 
higher degree of accuracy. 
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Fuzzy Logic Based Parametric Optimization 
Technique of Electro Chemical Discharge 
Micro-Machining (y- CDM) Process during 
Micro-Channel Cutting on Silica Glass 


9.1 Introduction 


The necessity for complex micro-miniature materials with better technical precision has 
grown over the years as a result of technological advancements in both conductive and 
nonconductive materials. On the other hand, nonconductive materials, in particular silica, 
are widely employed for miniature process. Engineers, however, face a difficult problem in 
machining these materials with high dimensional accuracy. Due to their intrinsic hardness 
and brittleness restrictions, traditional micromachining techniques like micro-drilling, 
micro-milling, and micro-grinding cannot fabricate complex micro features within the tol- 
erance level. 

Multiple nontraditional machining approaches have been used to solve problems with 
conventional machining and have successfully machined regardless of their mechanical 
and chemical properties, molding the materials into the necessary forms with surface qual- 
ity adherence. The most widely used and marketed procedures for advanced micro- 
manufacturing are micro-electro chemical machining (u-ECM) and micro-electro discharge 
machining (u-EDM). However, there are several issues with p-ECDM that are inherent, 
such as poor surface quality, a decreased material removal rate (MRR), a high rate of tool 
wear, thermally driven cracks and the production of recast layers. 

There are several research work was done for the fabrication of miniature parts using 
electrochemical discharge micro-machining process (u-ECDM). Singh and Singh [1] illus- 
trated the influence of tool feed rate, tool rotation, and duty cycle on material removal 
rate and overcut of a machined hole and optimized those machining criteria using a 
novel combined entropy-VIKOR method. A Taguchi's methodology used electrolyte con- 
centration, applied voltage, and inter-electrode gap as process parameters on the output 
characteristics of material removal rate (MRR) and overcut rate. A multiobjective process 
optimization grey relational analysis (GRA) method is used for achieving the optimum 
response variables [2]. Mallick et al. [3] explore the effects of various process parameters 
of y-ECDM process based on the relationship between the machining parameters such as 
electrolyte concentration (wt%), applied voltage (V), width of cut (WOC), surface rough- 
ness (Ra), tool shapes on heat affected zone (HAZ) and material removal rate (MRR). 
A comparative study based on RSM-based GA and RM-based PSO is performed for 
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obtained optimum process parameters in an ECDM process [4]. A multi-response process 
optimization TOPSIS is used for the analysis of response variables of overcut and mate- 
rial removal rate (MRR) corresponding to input process parameters of electrolyte concen- 
tration gap, applied voltage, and inter-electrode gap [1]. A single as well as multiobjective 
optimization is applied using a genetic algorithm (GA) for obtaining the suitable optimal 
parameters in an ECDM process [5]. It was observed that moderate overcut and highest 
MRR are obtained for suspended electrolyte with stirrer effect [6]. In Chen et al. [7] a 
proper ultrasonic amplitude is added, and it is found that machining long-term perfor- 
mance is greatly enhanced. In order to control the operative gap and stable gas films 
directly below the micro-tool, Singh and Dvivedi [8] employed a pressurized feeding 
system and abrasive-coated tooling. According to Wang et al. [9], as the DC voltage 
increased, the MRR and surface resilience first increased and declined slightly. The bend- 
ing force was delivered to a micro-tool in the perspective of a magnetism in order to 
show [10] that it decreased as the electrolytic concentrations and applied voltage raised. 
Han et al. [11] enhanced step milling depth and produced micro-grooves with a high 
aspect ratio using the ECDM technique. We investigated the effects of control parameters 
on silicon carbide during micro-drilling [12]. By combining sodium hydroxide (NaOH) 
and potassium hydroxide (KOH), Sabahi and Razfar [13] increased the accuracy of per- 
formances of electrochemical discharge machining (ECDM). Yadav [14] examined and 
recorded the prospects for electrochemical spark machining (ECSM) research in the 
current state of the field and created intricate profiles with improved surface quality. 
Tang et al. [15] examined the effects of current pulses on the gas film and found that, lead 
to a full gas film developing, a sizable bubble was created around the electrode as a result 
of gas generation and bubble flocculation. The gas film was migrating upward at a mean 
speed of 1.03 m/s, according to the study. Oza et al. [16] employed Taguchi resilient 
design and L9 orthogonal array to identify the most appropriate parameterization 
conditions for surface texture and kerf width qualities. The tool for the travelling wire 
electrochemical discharge machining was wrapped wire with a 0.15 mm diameter, 
according to the experts. Bindu Madhavi and Hiremath [17] used a 370 m diameter stain- 
less steel (SS) tool to construct a micro-channel on 4 mm thick quartz glass utilising 
ECDM while taking electrolyte concentration (wt% C), varying voltage (V), and duty 
factor levels into consideration (percent DF). Bellubbi and Mallick [18] provided an illus- 
tration showing how surface imperfections increase as stand-off distance rises. In present 
research fuzzy logic based AI techniques are used for identification of response variables 
corresponding to input variable as well as predict the optimum response variable for a 
new set of testing data. There have been several researches performed on fuzzy logic 
such as the measured intelligent flow measurement technique [19, 20], turbidity mea- 
surement [21], performance analysis of flow sensor [22]. An improved version of ele- 
phant swarm optimization (ESWSA) algorithm is applied to get the optimum input 
parameters on an Aluminum 6061T6 plate [23]. 

As a result, it is evident from the literature review that various researchers and scien- 
tists' efforts have been made to accomplish their objectives, however a key focus might 
be placed on the area of micro-profile or micro-channel cutting on electrically insulating 
materials, such glass, employing the ECDM technique to increase machining depth by 
merging mechanized spring feed operation and Z axis motor. In this research article 
paper organized in the following steps, in the first stage a micro ECDM set up is ana- 
lyzed in Section 9.2, methodology and experimental results are explained Section 9.3, 
and fuzzy model implementation followed by result analysis and conclusion is described 
in Section 9.4. 
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FIGURE 9.1 
Diagram of u-ECDM e set-up [18]. 


9.2 Development of the Set Up 


To begin pursuing the goals of this study project and manage process variables like pulse 
on time (PT), machining voltage (v), and electrolyte concentration (EC), a homegrown 
design and development was made for an experimental ECDM setup. The experimental 
set-up for ECDM is depicted in Figure 9.1 and includes an automated spring feed system 
for tool movement and fixation along the X-Y-Z axis. 


9.3 Experimental Methodology and Result Analysis 


The data input parameters are electrolyte concentration (EC) (10,17.5, 25 wt%), pulse on 
time (PT) (40,50,60 us), applied voltage (v) (45,50,55 V), stand-off distance (SD)(0.5,1,1.5 
mm) and the average values of the experimental findings so achieved, each experiment 
being carried out three times under the identical machining circumstances with the defined 
specifications of Tungsten tool tip diameter 200 um and inter-electrode gap (IEG) 30 mm, 
pulse frequency 50 Hz. Slip gauges were inserted between the work and a tungsten tool tip 
to measure the stand-off distance. Table 9.1 shows the empirical design and test outcomes. 
To perform this study, out of 27 datasets, 24 datasets were used to construct the fuzzy logic 
model implemented by four input variables and three output variables. Three testing data- 
sets were used for cross validation purposes. 


TABLE 9.1 
Training and Testing Experimental Dataset [18] 


Voltage Pulse on Time Stand-off Distance Electrolyte Concentration MRR OC MD 
(V) (us) (mm) (wt%) (mg/hr) (um) (um) 

Run Order (x1) (x2) (x3) (x4) 
1 45 40 1 17.5 72.93 172.328 291.722 
2 55 40 1 17.5 88.65 263.234 354.65 
3 45 60 1 17.5 86.413 186.43 345.642 
4 55 60 1 17.5 102.42 301.1 409.678 
5 50 50 0.5 10 77.56 210.21 310.25 
6 50 50 1.5 10 56.87 246.797 220.48 
7 50 50 0.5 25 83.96 15524 335.86 
8 50 50 1.5 25 82.24 226.447 328.957 
9 45 50 1 10 45.55 118.365 1822 
10 55 50 1 10 90.445 225.1 361.82 
11 45 50 1 25 88.45 106.356 353.83 
12 55 50 1 25 76.56 310.12 306.24 
13 50 40 0.5 17.5 84.34 154.856 337.37 
14 50 60 0.5 17.5 98.22 227.494 392.88 
15 50 40 1.5 17.5 80.38 225.23 321.52 
16 50 60 15 17.5 91.15 319.584 364.54 
17 45 50 0.5 17.5 77.47 102.299 309.89 
18 55 50 0.5 17.5 90.844 292.867 363.376 
19 45 50 1.5 17.5 68.42 213.223 273.78 
20 55 50 15 17.5 84.36 294.494 337.33 
21 50 40 1 10 56.44 170.265 225.76 
22 50 60 1 10 82.98 208 331.92 
23 50 40 1 25 87.32 248.376 349.23 
24 50 60 1 25 85.24 269.243 340.97 
25 50 50 1 17.5 67.18 244.212 268.75 
26 50 50 1 17.5 66.12 247.278 264.48 
27 50 50 1 17.5 65.82 242.278 263.39 


UPL 


SuUa pow 22114507) sof 22u231]]o1u] Pfi MY 


Fuzzy Logic Based Parametric Optimization Technique 145 


9.3.1 Effects of Process Parameters on MRR, OC, and MD 


Figure 9.2 represents the fuzzy inference system for the present model where four input 
effects were pulse-on time (PT), electrolyte concentration (EC), applied voltage (v), and 
stand-off distance (SD), and the effects on the response variables were OC, MRR, and MD. 

Figures 9.3-9.6 present the triangular membership function of input variables pulse-on 
time (PT), applied voltage (v), electrolyte concentration (EC), and stand-off distance (SD), 
where each of the input variables has three membership functions possible, creating the 
combination of 81. Figures 9.7-9.9 present the membership function of the output variables 
OC, MRR, and MD. Figure 9.10 indicates the possible logic statements between four input 
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FIGURE 9.2 
FIS model for input and output process variables. 


Membership function plots Plot points: | 181 | 


40 42 EE 46 43 50 52 54 56 58 60 


input variable "voltage" 


FIGURE 9.3 
Membership function for voltage. 
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FIGURE 9.4 
Membership function for pulse on time. 
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FIGURE 9.5 
Membership function for stand-off distance. 


variables and three output variable. Figures 9.11-9.16 show the 3D surface views of each of 
the output variables with respect to input variables. From the 3D views we get the follow- 
ing conclusions. Increases in pulse-on duration, applied voltage, and electrolyte concentra- 
tion all result in a rise in MRR, but stand-off distance increasing results in a drop in MRR. 
When performing the micro-machining procedures to raise the sparking rate, the critical 
voltage and threshold voltage, which both assess the impacts of pulse-on time, have a sig- 
nificant impact on the machinability. Two parameters are modified for the purpose of ana- 
lyzing the parametric effects on the machining criteria, while the remaining parameters are 
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Membership function plots Plot points: 


FIGURE 9.6 
Membership function for electrolytic concentration. 
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FIGURE 9.7 
Membership function for MRR. 


held at their accurate estimation and all fixed parameters are maintained constant. Increases 
in pulse-on duration, applied voltage, and electrolyte concentration all result in an increase 
in operating current (OC), whereas drops in stand-off range result from slowing spark 
rates. Side sparking and OC are both enhanced when the stand-off distance is enhanced. 
Itis also evident that when sparking rate is raised, applied voltage and pulse-on time rise. 
At 50 V, consistent sparking is produced, which improves surface properties. Electrolyte 
density of 17.5 wt?o also accompanies and lowers if stand-off distance (SD) is enhanced 
because the sparking rate is lessened. 
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FIGURE 9.8 
Membership function for OC. 


Membership function plots Plot points: 181 
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FIGURE 9.9 
Membership function for MD. 


9.3.2 Determination of Optimized Condition 


Figures 9.11-9.16 show the parametric combination for minimum OC, maximum MRR, 
and maximum MD. From Figure 9.11 and 9.14 itis found that maximum MRR is tracked at 
the maximum level of applied voltage, pulse-on time, medium electrolytic concentration, 
stand-off distance. From Figure 9.12 and 9.15 it is added that for minimum supply voltage, 
maximum electrolytic concentration, and medium stand-off distance and pulse-on time, 
OC is minimum. From Figure 9.13 and 9.16 it is found that for maximum pulse-on time, 
and medium voltage, electrolytic concentration, and stand-off distance, MD is maximum. 
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. If (voltage is medium) and (pulse on time is d off dist. 

16. If (voltage is medium) and (pulse, on time is high) and (stand, off dist. 

17. If (voltage is low) and (pulse, on time is medium) and (stand, off, dist. is low) and (Electro lytic concen 
18. If (voltage is high) and (pulse on, time is medium) and (stand, off dist. is low) and (Electro lytic concer 
19. If (voltage is medium) and (pulse on, time is medium) and (stand, off, dist. is medium) and (Electro. lytic 
20. If (voltage is high) and (pulse. on, time is medium) and (stand, off, dist. is high) and (Electro_lytic_concer 
21. If (voltage is medium) and (pulse. on time is low) and (stand, off. dist. is medium) and (Electro, lytic con 
22. If (voltage is medium) and (pulse on time is high) and (stand, off dist. is medium) and (Electro lytic, cor 
23. If (voltage is medium) and (pulse on time is low) and (stand off dist. is medium) and (Electro lytic con 


FIGURE 9.10 
Fuzzy inference rule between process input and output variables. 


FIGURE 9.11 
3D surface view for voltage, pulse-on time, and MRR. 
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FIGURE 9.12 
3D surface view for pulse-on time, voltage, and OC. 
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FIGURE 9.13 
3D surface view for pulse-on time, voltage, and MD. 
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FIGURE 9.14 
3D surface view for stand-off distance, electrolytic concentration, and MRR. 
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FIGURE 9.15 
3D surface view of electrolytic concentration, stand-off distance, and OC. 
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FIGURE 9.16 
3D surface view of electrolytic concentration, stand-off distance, and MD. 
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FIGURE 9.17 
Experimental and calculated value of test dataset for MRR. 


Figures 9.17-9.19 present the comparative study of response variables MRR, OC, and 
MD for the test dataset. For each of the characteristics graphs it has been seen that both the 
calculated results follow the experimental results. From Table 9.2 it has been concluded 
that the RMSE error of the response variable is 3.68%, 5.64%, and 6.45% for MRR, OC, and 
MD, respectively. 
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FIGURE 9.18 
Experimental and calculated value of test dataset for OC. 
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FIGURE 9.19 
Experimental and calculated value of test dataset for MD. 


TABLE 9.2 

RMSE for the Test Dataset 

Name of Response Variable MRR (%) OC (%) MD (%) 

RMSE 3.68 5.64 6.45 
SEE 


9.4 Conclusions 


The accompanying information was gathered from the parametric study of the existing 
study’s capabilities of the channel cutting on silica glass while using an ECDM setup. This 
analysis was based on extensive experimental observations and desire function analysis. 
The overall research segments into two parts; in the first part optimal process parameters are 
achieved corresponding to independent input variables already discussed in the previous 
section while in the second section prediction of the testing data set has been performed. 
Regarding these two research outputs, the following conclusion has been made. By raising 
voltage, electrolyte concentration, and pulse-on time, the machining depth, MRR, and OC 
rise; however, when stand-off distance is raised, these parameters drop. 
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Study of ANFIS Model to Forecast the Average 
Localization Error (ALE) with Applications 
to Wireless Sensor Networks (WSN) 


10.1 Introduction 


A wireless sensor network (WSN) is a self-organizing network made up of numerous 
small, inexpensive sensor nodes that can track changes in the physical or environmen- 
tal characteristics [1-3]. Its practical applications include energy harvesting [4], health 
monitoring [5], precision farming [6], target tracking [7], transportation management [8], 
global-scale wildlife monitoring [9], environmental monitoring [10], and business and 
home automation [9, 11]. The majority of applications call for these sensors to estimate 
their coordinates precisely while using fewer resources. These sensors include an inbuilt 
Global Positioning System (GPS) device that allows them to find their coordinates quickly 
[12]. Due to its size and expense, GPS cannot be feasibly incorporated into all sensors. 
A number of algorithms have been applied for the localization of unknown nodes with 
the help of anchor nodes [13, 14]. 

To address various localization issues, numerous localization methods have been devel- 
oped [15]. These algorithms must be adaptable in order to function successfully in a wide 
range of indoor and outdoor settings and topologies. The four types of localization protocol 
used in WSN are range-free algorithms, range-based algorithms, centralized algorithms, 
and decentralized algorithms. In the range-based approach, nodes choose their places by 
calculating the angular distance from anchor nodes. There are several parametric tech- 
niques like time of arrival (ToA), angle of arrival (AoA), time difference of arrival (TDoA), 
and received signal strength indicator (RSSI) used to acquire these estimations [16-18]. 

Due the limitations of sensor-based equipment, range-free localization techniques are a 
financially distinct choice. Range-free localization adopts two different protocols, namely 
(1) centroid methods, where a high density of sensor node sites get historic locations in the 
forms of range-free localization protocols [19, 20] and (2) hop-based techniques, which rely 
on saturating the network with connectivity data like hop count [21, 22]. 

The data transmission to a central node is necessary for the centralized approach in 
order to compute the location of the mobile node. Due to the power limitations on each 
sensor as well as the lengthy multihop information transmission, this technique is fairly 
expensive. As a result, any connection to a facility for centralized computing is expensive 
because each sensor node has limited power accessibility [23, 24]. Additionally, delivering 
time arrangement information over a network introduces latency and also uses more 
energy and network bandwidth. Decentralized localization techniques, on the other hand, 
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necessitate fewer connections between sensor nodes, which reduce the WSN's power 
usage. Decentralized localization frameworks need hardware based components to make 
a connection with each portable target in order to collect localization data from reference 
nodes [25, 26]. 

In recent years, AI techniques have become more and more important in engineering 
optimization [27-35]. This can be explained by the fact that they perform better than tradi- 
tional mathematical optimization methods while using significantly less computing power 
and memory. Modeling WSN node localization typically involves solving a multidimen- 
sional optimization issue. Various AI techniques have been presented for range-based 
approaches in an effort to design a less complex algorithm [36]. A two-step AI model is 
used for perfect localization in a WSN where a range-free localization strategy based on 
received signal intensity has been applied. In the first approach, nodes are localized using 
a fuzzy logic system (FLS) by adding the edge weights of each anchor node. Then, a genetic 
algorithm is used to calculate the ideal edge weight (GA). The second approach really 
employs a neural network (NN) technique, where the output is the approximate position 
of wireless nodes and the input is the received signal intensity. Utilizing FLS and GA, 
simulation results using NN are in contrast [34]. In [37, 38], the authors suggested a 
weighted centroid localization technique-based "range-free" localization strategy for 
WSNs. During the construction of membership function for RSSI and link quality using FL 
[39], a selection of edge weights has been required. Therefore, in the fuzzy phase a com- 
bined Mamdani-Sugeno FL inference is used. When it comes to localization precision, this 
approach beats the traditional centroid. 

The precision of these localization techniques is evaluated by ALE metrics. In order to 
properly tune the ALE below the appropriate threshold, we use an algorithm which pro- 
duces minimal ALE value corresponding to the ideal network parameters. We have devel- 
oped a powerful AI method for precise and very fast prediction of ALE in this situation to 
address this constraint. Basic localization of WSN nodes is shown in Figure 10.1. 

Two alternative ANFIS models, namely the grid partition ANFIS model and the sub- 
clustering ANFIS model [40, 41] have been presented in this article. Four input features— 
anchor ratio, transmission range, node density, and number of iterations—are taken from 
the experimental dataset. In addition, this article is separated into six sections. We covered 
the system model for the node localization problem in Section 10.2. The ANFIS model and 
its implementation are described in Section 10.3. Result analysis is explained in Section 
10.4 followed by conclusions in Section 10.5. 
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FIGURE 10.1 
Localization of WSN nodes. 
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10.2 System Model 


The system architecture created for the node localization procedure is discussed in this 
section. The process for calculating the distance between the anchor and unknown nodes 
is then covered. Later we discuss how the updated node localization algorithm's objec- 
tive function was formed and how it operates. Finally, the two different ANFIS models 
designed to for this specific purpose are presented. 


10.2.1 Distance Calculation for Generalization of Optimization Problem 


The unknown nodes determine their separation from the anchor nodes using the RSSI. 
Due to multipath fading and shadowing [42], there is a power loss experienced by sen- 
sors during the information transmission. Path loss at a certain distance is a function of 
path loss at reference distance do, and the distance between TX and RX. In addition to this 
another component a also effect the path loss which indicates how the TX signal is fading 
due to increase the distance between TX and RX. The range a is set to 2 to 6 depends on 
a variety of physical factors, including antenna height, signal frequency, and propagation 
conditions [43]. There is a range error present at all times while determining the position 
of the unknown nodes. In order to evaluate the unknown nodes' positions as precisely as 
feasible, we take into account this inevitable range error. 

The mean of the square of the inaccuracy between the actual distances of evaluated 
node coordinates and the estimated distance of actual unknown node coordinates from the 
nearby anchor nodes is used to calculate the optimization function (OF), which is the goal. 
There should be at least three anchor nodes within the transmission range of an unknown 
node in order to calculate its localization error. The evaluated position of the unknown 
node is the OF's lowest value. 


10.2.2 Simulation Setup 


In 100 x 100 m? square area, all the sensor nodes are distributed randomly. Different sensor 
node densities are implemented: 100, 200, and 300. The anchor nodes ratio is set to 14-30. 
The communication range for all sensor nodes is set to be 15-25 m and the number of itera- 
tions is set to 14-100. Before setting all these parameters, let's assume there is no localiza- 
tion error for any unknown node. 


10.2.3 Experimental Results and Performance Analysis 
10.2.3.1 The Effect of Anchor Density 


A key factor determining the effectiveness and cost of localization for WSN is anchor den- 
sity [44]. This subsection assesses how anchor density affects the effectiveness of localiza- 
tion. Variously 10%, 20%, 30%, 40%, 50%, and 60% of all sensor nodes are designated as 
anchors. Each sensor node's communication range is 25 meters. From earlier research, it is 
clear that different anchor ratios (AR) in the network for various node densities are used 
to evaluate the ALE and confidence interval of location error (CILE). The anchor ratio can 
be increased from 10% to 40%, which will greatly enhance ALE [45, 46]. The number of 
unknown nodes that can realize within their communication range is increased by increas- 
ing the number of anchor nodes. The effects on the average localization error, however, 
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become negligible while the anchor ratio keeps growing. In other words, there may not 
always be a need to add more expensive, specialized hardware-requiring anchor nodes. 


10.2.3.2 The Effect of Communication Range 


Communication range is another important parameter determining the localization error 
of an unknown node and energy consumption of sensor nodes [47]. 20% of the sensor 
nodes are set as the number of anchor nodes. In this research the variation of communica- 
tion range is from 10 m to 25 m. According to earlier research, the communication range is 
just around 20 m, and the ALE is slightly larger as a result of the network's structure and 
lack of connectivity for many nodes. However, as more anchor information is supplied for 
determining the localization of unknown nodes, ALE is greatly reduced by expanding the 
communication range [49]. It is clear that as node density increases, the average localiza- 
tion error eventually decreases as well. This is because as node density rises, the number 
of anchor nodes that are accessible within communication range rises, as does the net- 
work connectivity between sensor nodes. Depending on the node density and communica- 
tion range, the localization success ratio may vary [49, 50]. According to the experimental 
work, when the communication range is approximately 10 m and the node density is 100, 
a localization success ratio of roughly 20.8% may be reached. Because there are a substan- 
tial number of anchor pieces of information within the communication range, localiza- 
tion success ratio clearly improves as communication range grows. As a result, finding the 
unknown nodes is simpler. The localization requirements, such as localization accuracy, 
localization success ratio, and the energy constraint of sensor nodes, influence the choice 
of communication range. 


10.3 Adaptive Neuro-Fuzzy Inference Architecture 


ANFIS strategy is utilized to manage nondirect and complex problems [50, 51]. In an 
ANFIS hybrid intelligent system, a straightforward informational index produces desired 
output of a fuzzy logic controller through an interconnected neural network handling 
components by means of weighted data associations. ANFIS consolidates the quality of 
the two intelligent strategies FLC and NN into a solitary strategy [52, 53]. ANFIS model 
parameters of a FIS tuned by the neural network learning strategies. A five-layer ANFIS 
structure appears in Figure 10.2. 


* It amends fuzzy if-then principles to depict the conduct of a nonstraight and com- 
plex framework. 


e It doesn't require prior human skill. 
e It is simple to actualize. 
* It empowers quick and precise thinking and learning quality. 


* Because of the legitimate choice of a reasonable decision of membership func- 
tion, strong speculation, and brilliant clarification of fuzzy guidelines, it offers the 
desired output. 


e Itis simple to coordinate the both etymological and numeric information for criti- 
cal thinking. 
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FIGURE 10.2 
ANFIS architecture for five layers. 


FIGURE 10.3 
Firing of rules for different membership functions. 


Figure 10.2 demonstrates the ANFIS engineering executed by the two principles where 
fixed nodes and versatile nodes are shown as circles and squares. Figure 10.3 represents 
a first order Sugeno model of ANFIS architecture. In Layer 1, every node is versatile, and 
the outputs of Layer 1 are the fuzzy membership of input. In Layer 2, the nodes are fixed 
and the fuzzy administrators fuzzify the inputs by utilizing AND operators. The symbol II 
shows a basic multiplier activity. Output of layer 2 provides the standardized firing quali- 
ties. In Layer 3, fixed nodes named as N normalize the firing strengths from the previous 
layer. In Layer 4, nodes are versatile, and the output of every node is basically a standard- 
ized firing quality and a first order polynomial. Layer 5 only consists of a single fixed node 
X which plays out the summation of every single approaching sign. 


10.3.1 Hybrid Learning ANFIS 


In this algorithm the ANFIS model is a blend of least squares strategies and gradient 
descent strategies. In the forward pass learning calculation, least square techniques are 
utilized to determine node outputs until Layer 4 and the consequent parameters. In the 
backward path, the gradient descendent method sends the error signals to the previous 
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stage and updates parameters. The hybrid learning approach is quicker than the original 
back-propagation technique because of diminishing search space dimensions. The ideal 
estimation of these parameters is controlled by the least squares technique. At the point 
when these parameters are variable, the search space in hybrid learning process increases; 
subsequently, the convergence rate of the training datasets becomes slower. To tackle the 
issue of search space, the hybrid algorithm of ANFIS joins two techniques: (1) least squares 
strategy and (2) gradient descent technique, where the least squares strategy is used to 
advance the subsequent parameters and the gradient descent technique is utilized to 
streamline the reason parameters. From the review it has been seen that the hybrid algo- 
rithm gives a high level of proficiency in preparing the ANFIS frameworks. 


10.3.2 ANFIS Training Process 


The ANFIS strategy starts by taking the two distinctive datasets, training and testing, 
where the training dataset comprises info and yield vectors. A membership function of the 
fuzzy model is implemented by the training dataset and acquires a limit of threshold value 
with the help of examination between experimental and calculated output by applying a 
least square technique; however, in the event that the error between the experimental and 
calculated value is huge, at that point the gradient decent method automatically refreshes 
the membership function until the magnitude of the error is just less than threshold value 
and the procedure is ended. The purpose of the checking dataset is to contrast the model 
and the genuine framework. Training datasets of ANFIS re learned by a mix of the least 
squares strategy and the gradient descent strategy. ANFIS is generally utilized in a versa- 
tile process control framework to accomplish the most ideal exhibition. Figure 10.4 depicts 
the working process of ANFIS, where the training datasets are used to model the ANFIS 
in the MATLAB platform. Datasets of ANFIS model are placed in a matrix, where the last 
column indicates the output. As per the framework architecture, the membership function 
is created. By utilizing the correct learning process, the designer takes the proper member- 
ship function of the input variables. The initial arrangement of the membership function 
works likewise and is made by utilizing the direction genfis in MATLAB [53]. The frame- 
work begins to training after the underlying membership function is made. In the wake of 
utilizing the fismat direction, input information is prepared and the membership function 
naturally builds. After termination of the training process, the final membership functions 
of the input variable and training error are produced. To build the exactness of the model, 
checking datasets can be utilized. The ANFIS model can be utilized for just one training 
dataset; however, the framework's viability can be expanded by applying the checking 
datasets in a framework. 

In the present research, the evalfis work in ANFIS is utilized to ponder and assess the 
framework execution of average localization error (ALE) in WSN. At first in this model 
input datasets are used to design the fuzzy framework, in spite of the fact that these input 
datasets do not exclude any output esteems. The output of the evalfis capacity gives the 
final output of the ANFIS system. The relationships between the experimental and calcu- 
lated ALE are set after the preparation estimated the system output. When the model is 
prepared, we can further test the framework against various arrangements of input datas- 
ets to check the usefulness. The training steps are clarified in Figure 10.4, where the fore- 
cast of ALE shows the ANFIS model assisted by four input process variables and one 
output variable [54-55]. Learning rules of the ANFIS model thoroughly depend upon the 
input and output variable membership function, set by the human expert. The anchor ratio 
(AR), transmission range (TR), node density (ND), and number of iterations (IT) have three 
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FIGURE 10.4 
Flowchart of ANFIS training system. 


membership functions named as high, medium, and low, while for output parameters, the 
ALE has three membership functions, the same as for input variables. The ANFIS prepar- 
ing procedure begins by the fuzzy sets, number of information factors, and state of the 
enrollment capacity of the information factors. 

Here, N and e; are the total number of predictions and the difference between predicted 
and original series output, respectively. In the ANFIS structure, four information parame- 
ters and a single output controlled parameter, liquid flow rate, appear in Figure 10.5. Error 
in training informational indexes and testing datasets are shown in Figures 10.6 and 10.7, 
respectively. From the error diagram it is seen that preparation error and testing error both 
are decreased by expanding the number of epochs. Figure 10.7 demonstrates the standard 


FIGURE 10.5 
Distribution of training dataset with respect to dataset index. 
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FIGURE 10.6 
Distribution of testing dataset with respect to dataset index. 
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FIGURE 10.7 
Training error with respect to number of epochs (training error 0.0693 after 100 epochs). 
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rule viewers of the present fluid stream control framework between the four information 
factors and one yield controlled variable in ANFIS model. 


10.4 Result Analysis 


Overall, research authors have used two different types of ANFIS model: grid partition 
methods and subclustering methods. 


10.4.1 Grid Partition Method 


Before implement of the ANFIS model, the overall 107 member dataset was partitioned 
into two sections: a training dataset that contained 96 items and a testing or target dataset 
containing 11 items. Training and testing data characteristics are presented with respect 
to the data index in Figure 10.5 and 10.6, respectively. Each of the input parameters of 
unknown sensor variables has three membership functions; hence there should be 81 pos- 
sible combinations between process input and output variables as shown in Figure 10.8. 
In the rule viewer for each combination of input variables, the output variable, namely 
average location error, can be achieved, which is shown in Figure 10.9. Figures 10.10-10.12 
show how the unknown sensor output variable ALE depends upon the possible combina- 
tion of input variables such as anchor ratio (AR), transmission range (TR), node density 
(ND), and number of iterations (IT). 


1. If (input1 is inimf1) and (input2 is in2mf1) and (input3 is in3mf1) and (input4 is inámf1) then (output is out1mf1) (1) 
2. If (input1 is in mf1) and (input2 is in2mf1) and (input3 is in3mf1) and (input4 is in4mf2) then (output is out1mf2) (1) 
3. If (input1 is in mf1) and (input2 is in2mf1) and (input3 is in3mf1) and (input4 is in4mf3) then (output is out1mf3) (1) 
4. If (input is in1mf1) and (input2 is in2mf1) and (input3 is in3mf2) and (input4 is inámf1) then (output is out1mf4) (1) 
5. If (input1 is inimf1) and (input2 is in2mf1) and (input3 is in3mf2) and (input4 is inámf2) then (output is outimf5) (1) 
6. If (input1 is inimf1) and (input2 is in2mf1) and (input3 is in3mf2) and (input4 is in4mf3) then (output is out1 mf6) (1) 

7 .If (input1 is in mf1) and (input2 is in2mf1) and (input3 is in3mf3) and (input4 is inámf1) then (output is out1mf7) (1) 
8. If (input1 is in1mf1) and (input2 is in2mf1) and (input3 is in3mf3) and (input4 is inámf2) then (output is out1mf8) (1) 
9. If (input is in1mf1) and (input2 is in2mf1) and (input3 is in3mf3) and (input4 is inámf3) then (output is outimf9) (1) — 
10. If (input1 is in1mf1) and (input2 is in2mf2) and (input3 is in3mf1) and (input4 is inámf1) then (output is out1mf10) (1) 
11. If (input1 is inimf1) and (input2 is in2mf2) and (input3 is in3mf1) and (input4 is inámf2) then (output is out1mf11) (1) 
12. If (input1 is in1mf1) and (input2 is in2mf2) and (input3 is in3mf1) and (input4 is in4mf3) then (output is out1mf12) (1) 
13. If (input1 is inimf1) and (input2 is in2mf2) and (input3 is in3mf2) and (input4 is inámf1) then (output is out1mf13) (1) 
14. If (input1 is inimf1) and (input2 is in2mf2) and (input3 is in3mf2) and (input4 is inámf2) then (output is out1mf14) (1) 
15. If (input1 is inimf1) and (input2 is in2mf2) and (input3 is in3mf2) and (input4 is in4mf3) then (output is out1mf15) (1) 
16. If (input1 is inimf1) and (input2 is in2mf2) and (input3 is in3mf3) and (input4 is inámf1) then (output is out1mf16) (1) 
17. If (input is in1mf1) and (input2 is in2mf2) and (input3 is in3mf3) and (input4 is inámf2) then (output is out1mf17) (1) 
18. If (input1 is inimf1) and (input2 is in2mf2) and (input3 is in3mf3) and (input4 is inámf3) then (output is out1mf18) (1) 
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FIGURE 10.8 
Rule editor for the input and output system variables containing 81 rules. 
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FIGURE 10.10 
3D surface view of anchor ratio (AR), transmission range (TR), and ALE. 


FIGURE 10.11 
3D surface view of transmission range (TR), node density, and ALE. 
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FIGURE 10.12 
3D surface views of node density, iteration, and ALE. 


Figure 10.7 shows the training error with respect to number of epochs. The training 
error is reached at 0.0693 after 100 epochs. 


10.4.2 Subclustering Method 


Unlike the grid partition method, all the test dataset and training dataset are distributed 
over the data index value already as shown in Figure 10.5 and 10.6. In the subclustering 
method, training error is similar to grid partition method is about to 0.069 after 100 epochs 
shown in Figure 10.13. 

Figure 10.14 and 10.15 present the regression value of training and validation datasets 
which is about 1, indicating both the proposed models are best fitted with the testing and 
validation dataset. 

Figure 10.16 and 10.17 present the regression value for both the train dataset and test 
dataset. For both the cases the grid partition ANFIS model outperformed the clustering 
ANFIS model. Figure 10.18 shows the comparative result between actual ALE and the ALE 
calculated from the grid partition and subclustering ANFIS models. 

From Figure 10.18 it can be seen that ALE calculated from grid partition is compara- 
tively better fitted than the ALE calculated from the cluster ANFIS model. 
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Training error 0.069 after 100 epochs. 


FIGURE 10.13 
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Training: R=0.99238 


Output ~= 0.97*Target + 8.4 
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Test: R=0.99093 


Output ~= 0.95*Target + 17 


FIGURE 10.14 
Regression value for after training data set. 
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Validation: R=0.99113 


Output ~= 1*Target + 0.87 


100 200 300 400 500 
Target 


All: R=0.9919 


Output ~= 0.97*Target + 8.8 


FIGURE 10.15 
Regression value after validation. 
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FIGURE 10.18 
Comparative result for experimental ALE, grid partition ALE, and cluster ALE. 


Study of ANFIS Model to Forecast the Average Localization Error 175 


10.5 Conclusions 


Demonstration and advancement of advanced wireless communication networks is an 
intriguing assignment for researchers. In wireless sensor networking systems, implemen- 
tation of any unknown is very important for proper power transmission between TX and 
RX. To get the optimum localization error of unknown nodes depends upon the four input 
potential parameters, namely transmission range (TR), node density (ND), anchor ratio 
(AR), and iterations (IT). In this research to get the optimum average localization error, 107 
datasets were used to implement the model, and for the prediction of the model, a new 
ANFIS model is used. Overall result analysis is carried out by the two different ANFIS 
models, namely grid partition method and subclustering method. 

From the result analysis, it is concluded that both the algorithms achieved the same 
magnitude of training error, but for the forecasting of ALE of unknown sensor nodes in the 
testing dataset, the grid partition based ANFIS model was best fitted compared to the sub- 
clustering method. 

The following inferences are drawn from the graphs: 


* The type and quantity of membership functions are crucial when developing an 
ANFIS architecture. 


* As a system output, the number of membership functions and training data 
samples has a favorable impact on the ALE. 


* Increasing the number of membership functions results in over fitting but has no 
effect on the model's performance. 


e A larger training sample size results in more respectable outcomes. 
* The number of epochs aids in preventing over fitting. 
* The test's results show what is necessary for improved performance. 


Besides the ANFIS calculation, how the metaheuristics streamlining strategy is utilized to 
improve the productivity, exactness, convergence speed, security, and achievement pace of 
the present procedure control is an additional future research direction. 
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11 


Performance Estimation of Photovoltaic 
Cell Using Hybrid Genetic Algorithm 
and Particle Swarm Optimization 


11.1 Introduction 


The energy shortage and the adverse consequences of pollution make renewable energy 
like solar energy more attractive in the modern era [1-3]. Simulation of the solar cell is cru- 
cial to estimate the performance given the attributes of current-voltage and power-volt- 
age in a solar cell under various solar intensities and temperature conditions. There are 
several studies that have been performed regarding the solar cell, but the most common 
literature survey is performed around the double diode model (DDM) and single diode 
model (SDM) [4-8]. In a photovoltaic (PV) system, estimation of the optimal parameter 
is required for the prognosis of a solar cell's efficacy; for that, the optimal optimization 
approach is needed [9, 10]. 

One of the foremost prominent subclasses of optimization algorithms is metaheuristics, 
where simplifying patterns are commonly spurred by scientific observations, creature 
behaviors, or evolutionary assumptions. The topic of parameter evaluation of PV cells or 
modules has been addressed using a multitude of metaheuristic approaches or their adap- 
tations, including genetic programming [11, 12], differential evolutionary [13, 14], ant bee 
colony (ABC) [15, 16], chaotic Jaya algorithm (JAYA) [17], teaching learning based optimi- 
zation (TLBO) [18, 19], shuffled frog leaping algorithm (SFLA) [20], moth-flame optimiza- 
tion calculation [21], ant lion optimization [22], sine-cosine algorithm [23], grey wolf 
optimization (GWO) [24], flower pollination algorithm (FPA) [25, 26], improved elephant 
swarm water search calculation [27, 28], particle swarm optimization (PSO) [29], and so on. 
A comparative study is performed for parametric evacuation of a single solar cell utilizing 
three metaheuristic optimizations; the bat algorithm, cuckoo optimization, and firefly 
algorithm [30]. Parametric optimization for a double diode solar cell is performed by wind- 
driven optimization [31]. For the identified issue, these metaheuristic assessments have 
been accomplished magnificently. In addition to these metaheuristic optimization tech- 
niques, other AI techniques like fuzzy logic controller [32, 33], ANN model [34], genetic 
algorithm (GA) [35], and so on can be offered in enhancing the variables of SDM and DDM 
models. 

Whatever the case, accepting the no free lunch principle [36], there are no efficient sin- 
gle metaheuristics for addressing a plethora of difficulties. This is why it is still crucial for 
the researcher to quantify the specifications of PV cells or panels. The improved whale 
optimization algorithm (IWOA) [39], biogeography-based heterogeneous cuckoo search 
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algorithm [40], self-adaptive ensemble-based differential evolution [41], hybrid differen- 
tial evolution with whale optimization algorithm [42], improved chaotic whale optimiza- 
tion algorithm [43], and hybrid bee pollinator flower pollinator were utilized for process 
parameters of a double diode, single diode, and PV model [45], along with hybrid firefly 
and pattern search algorithms [46] and hybrid PSO and GWO algorithm [47, 48]. To get 
the optimum parameters, convergence speed, and computational time, other improved 
optimization techniques can be applied, and this is still an open challenge for present 
research. 

Particle swarm optimization (PSO) is an uncomplicated technique, simple to carry out, 
computationally adequate, restricted in the local minimum, and dealing with limited 
local/global search abilities in certain constraints [49-51]. However, its solution may be 
trapped in local optima, and due to fast convergence, its results may not be accurate [52]. 
On the other hand, the genetic algorithm provides the global optimum solution, but it 
requires a significant amount of computational time as it provides the better result by 
increasing the number of iteration as well as the number of computational steps [35, 53- 
55]. Consequently, we present an integrated optimization research methodology by taking 
the advantages of both PSO and GA. 

The hybrid GA-PSO (HGAPSO) method is predicted to outperform other algorithms 
with similar objectives to function faster with varied sizes of workflow applications. 
Furthermore, because the GA mutation operator is used to improve the accuracy of the 
solutions utilized to identify the answer for many complicated and nonlinear problems, 
the hybrid GA-PSO algorithm may not get caught in the local optimal solution. 

The rest of this chapter is organized as follows. The scientific representation of the solar 
cell is illustrated in Section 11.2, and the synthesis of the goal functioning of the double and 
single diode models is demonstrated in Section 11.3. The proposed mixture of HGAPSO is 
expounded in Section 11.4. Next, re-enacted results and discussion of experimental results 
with the key study are given in Section 11.5. Finally, Section 11.6 closes this paper, followed 
by the references. 


11.2 Mathematics Model and Objective Function of the Solar Cell 
11.2.1 Single Diode Model (SDM) 
The equivalent circuit diagram of SDM [48] is portrayed in Figure 11.1. Output current of 


SDM solar cell can be figured as follows: 


Ic =Ipn -Ia Isn (11.1) 


Where Ic, lpw La, and Is are the cell output current, photogenerated current, diode cur- 
rent, and shunt resistor current, respectively. 
According to the Shockley equation, I4 can be calculated as 


Ia = Isa fepe) (11.2) 
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FIGURE 11.1 
Single diode model [48]. 


Where Isa, Ve, and R, are the reverse saturation current, cell output voltage, ideality 
factor, and series resistance, respectively. 

Let V, = KT/q; K is the Boltzmann constant, T is the absolute temperature (Kelvin), and 
q is the electron charge. Then Eq. (11.2) can be further modified into 


Bat e EE (11.3) 
nv, 


The current passing through the shunt resistor I4, is formulated as 


jode e (11.4) 
Ra 


After combining Eqs. (11.1), (11.3) and (11.4), the I-V relationship of the SDM can be 
communicated as 


Vc + IcR; Ve + IcR; 
Ic = Ion — sa] ex 1 11.5 
C = lph | al 1V, ) | Ra ( ) 


Hence from Equation 11.5, it can be seen that it contains five parameters (Ip, Isa, Rs, Ron, 
n) which ought to be estimated by the optimization tool. 


11.2.2 Double Diode Model (DDM) 


The equivalent circuit diagram of DDM [48] is portrayed in Figure 11.2. Cell output current 
Ic can be representing by the following equation: 


Talla (11.6) 
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FIGURE 11.2 
Double diode model [48]. 


In DDM, the V-I relationship can be finally communicated as 


Vc A IcR; Ve T IcR, Vc +IcR; 
joo pa dexter pem 1 11.7 
pum «| al mV | al Y nV. | Ron S0 


From Eq. (11.7) it is seen that the model contains seven parameters which have to be 
estimated (Ion, Isai Isa, Rs, Ra, Ih, m). 

The reduced objective function relates to better assessed parameters. The nonlinear and 
transcendental objective function is hard to settle using the conventional method. 


11.2.3 PV Module Model 


A PV module's double-diode and single-diode variants, which is comprised of linked cells 
in series, can also be articulated as Eqs. (11.5) and (11.7), where 


V, = N,KT/ q. 


11.3 Objective Function 


The estimated current-voltage pattern of a PV system is often fitted to the observed pattern 
using metaheuristics. The goal of the assessment is to determine the critical parameters' 
appropriate options in order to reduce the difference in current between the measured and 
simulated current. Equation (11.8) defines the goal function as the root mean square of the 
error (RMSE) [56]. 


N 


RMSE(X)- oye la, X) (11.8) 


i-l 


Where N stands for the quantity of scientific results, and X stands for the set of predictor 
variables. 
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For the single diode model, f (Vc, Ic, X ) and X can be respectively expressed as Eqs. (11.9) 
and (11.10). 


Vc + IcR, Vc + IcR; 
Ve,lc,X)= pn — Isa] e 1 I 11.9 
Fl culc ) ph | sf nV, | Ra C ( ) 
1 Y 2 
RMSEmin = N 4 (Loud = Tatculated (Ios La, Rs, Ras) (11.10) 


X = Lose Isa Ray Rain) 


For the double diode model, f (Vc, Ic, X) and X can be correspondingly exhibited as Eqs. 
(11.11) and (11.12). 


Vc t IcR; Ve + IcR; Ve + IcR; 
Vo Tc, X) 2 Ion — sai | exp| == |-1 |- Iaz] ex 1 Ic (11.11 
f( eric ) ph al Pf mn | el al mV, ) | Ra cl ) 


N 
x ses m Icalculated (Iii haz Re Rat), (11.12) 


RMSE min : 
i-1 
X = {Ton Lan, Lao; Rs, Ra] 


The optimum value that is lower has parameters that are more accurately predicted. 
This task is challenging to tackle since the goal function is nonlinear and transcendent. 


11.4 Proposed Methodology 
11.4.1 Improved Cuckoo Search Optimization 


This section introduces a hybrid algorithm that merges the GA and PSO strategies. The 
schematic in Figure 11.3 provides the indispensable steps of the HGAPSO algorithm. PSO 
are among the most effective methodologies; however, because of how efficiently it con- 
verges, it typically adheres erroneously in convoluted scenarios. The GA algorithm sur- 
passed other strategies when employed to tackle a variety of intricate difficulties. Although 
the GA method may settle somewhat more slowly, it has higher probing flexibility. No 
strategy is adequate in treating all scalability issue effectively. The problem may be solved 
by integrating the existing approach to achieve the aggregate finest resolution [57, 58]. The 
proposed technique can boost global retrieval accuracy and avoid premature convergence. 
It could make it less likely to become locked in a local optimal solution. The best elements 
of both technologies' characteristics may be combined in the hybrid algorithm (Figure 11.3). 


184 Artificial Intelligence for Cognitive Modeling 


Initialize the number of iteration, population, parameters of 
GA and PSO 


dL 


Apply the GA over the fixed number of population and 
perform selection, crossover, and mutation 


No Reach maximum total iteration /2? 


« — Yes 


Apply PSO over GA for the generated population 


Calculate Gsce and Psce position 
Update the position and velocity 
Maximum iteration reached 
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function 
FIGURE 11.3 


Flowcharts for the proposed HGAPSO algorithm. 


No 


11.5 Results and Discussion 
11.5.1 Test Information 


In this section, we have taken test datasets of double diode and single diode RIC France 
silicon solar cells with 26 sets of I-V data at a temperature of 33?C and irradiation of 1000 
W/m? [30, 48]. The effects of the HGAPSO optimization approach on the two previously 
described benchmark difficulties are presented in this part. Additionally, we compare 
HGAPSO to the other two fundamental methodologies, PSO and GA, and provide a ratio- 
nal assessment of the evaluated results. PSO and GA are two well-known optimization 
strategies that we have adopted for reference purposes; the specifications made for each 
algorithm are described as follows (Table 11.1): 
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TABLE 11.1 
Parameter Setting for Each Algorithm 


PSO GA HGAPSO 
Number of population Number of populations 100, Number of population N = 100, mutation 
N = 100, inertia weight may crossover percentage (pc) 0.7 rate 0.05, single point crossover 
be between 0.9 to 0.4 
C,=C,=2 Mutation percentage (pm) Acceleration co-efficient (C,) 1.5 and global 
0.3, extra range factor for learning coefficient (C;) 2, r1 and 72 (0, 1), 
crossover (y) 0.4 and mutation degree of importance: a1 and a2 (0.4, 0.4) 
rate (mu) is 0.1 and a3 is 0.4 
TABLE 11.2 


Parameters Search Ranges [48] of RTC France Solar Cell Module 


PV System RTC Solar Cell 

Parameter Lower Limit Upper Limit 
Ibn (A) 0 1 

Isa (UA) 0 1 

R, (Q) 0 0.5 

Ra (Q) 0 100 

n 1 2 


For all calculations, the number of maximum iterations and population are set to 5000 
and 100, respectively [53, 59, 60]. For a SDM and DDM search space we are limited to 
(pn, Isa, Rs, Ra, n) and (Ion, sas, 1.42, Rs, Ray m, 2]. Hence we need an effective optimization 
tool so that it can find the optimal value from the search space of double diode and single 
diode PV cell models. The RIC solar cell parameter search range for the optimization is 
shown in Table 11.2. 

Because of the stochastic nature of metaheuristics, in a given number of statements, it 
may give diverse output. This strategy includes performing each statistic 20 times for each 
case, after which the quantifiable study is finished using the data. For the modeling of solar 
cells, we used Matlab 2013b version, and the specification of the computer system is an 
Intel(R) Core (TM) i3processor, 4 GB RAM, with Windows? operating system [61, 62]. In 
the course of these quantitative investigations, we tested and considered the veracity of the 
suggested analysis based on a number of models, including the computational efficiency 
test, the fitness test, the convergence test, the reliability test, and the accuracy test, which 
are all explained subsequently. Finally, a synopsis of the exhibitions is provided. 


11.5.1.1 Fitness Test 


Esteem output or wellness estimation of an advancement calculation is the most significant 
foundation to demonstrate its proficiency [59, 60]. Here, we have thought about three sig- 
nificant standards of fitness: worst fitness, mean, and best fitness obtained after multiple 
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TABLE 11.3 
Comparative Study Based on Different Levels of Fitness 


Case Method Maximum RMSE Minimum RMSE Mean of RMSE 
GA 0.01753624 0.001234366 0.0025362 

RTC single diode PSO 0.00244805 0.001022083 0.0020571 
HGAPSO 0.00121203 0.000925481 0.0019254 
GA 0.03214202 0.001693715 0.0032191 

RTC double diode PSO 0.03602997 0.001184587 0.00236821 
HGAPSO 0.00291256 0.001025632 0.00156812 


program runs. From Table 11.3, it may be seen that the proposed HGAPSO can arrive at 
the best wellness esteem (at least RMSE) for the entirety of the instances of photovoltaic 
frameworks. HGAPSO has the least RMSE for both RTC SDM and DDM model solar cells. 


11.5.1.2 Reliability Test 


An optimization tool ought to consistently reach the global minima as closely as conceiv- 
able; for example, it ought to be fruitful and effective in every run [63]. Thus, we tested 
the steadfast excellence of the hybrid HGAPSO in this subsection and make a comparative 
study with the other basic optimization techniques on the premise of standard deviation. 
The standard deviation estimates the changeability and consistency of the example or pop- 
ulace. Table 11.4 shows the proximity investigation dependent on the standard deviation. 
Standard deviations of HGAPSO are best for both the SDM and DDM models. 


11.5.1.3 Computational Efficiency Test 


Computational efficiency is the execution time taken by every optimization technique for 
determining the optimum value of the solar cell model [35]. Table 11.5 shows an average 
computation time for all the algorithms utilized in this research. From Table 11.5, when 
the parameters of the RTC double diode and single diode systems were improved, it was 
found that HGAPSO needed less computing time. 


TABLE 11.4 


Comparative Study Based on Standard Deviation 


Case Method Standard Deviation of RMSE 
GA 0.0088454 

RIC single diode PSO 0.02896407 
HGAPSO 0.00321841 
GA 0.00659481 

RIC double diode PSO 0.02646232 


HGAPSO 0.00321432 
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TABLE 11.5 


Comparative Study Based on Computational Time 


Case Method Average Computational Time (Sec) 
GA 78.8936 
RTC single diode PSO 169.4620 
HGAPSO 74.412 
GA 84.28 
RIC double diode PSO 294.37 
HGAPSO 68.643 


11.5.1.4 Convergence Test 


The precise statistical linkage cannot emerge from the searching performance of any opti- 
mization technique. An optimization technique is said to have better convergence speed 
when it provides the least RMSE for several runs with the variation of iteration number 
[56, 64]. Here we performed the convergence test by taking the iterations 100 to 5000, and 
each of the cases we ran 20 times. All the algorithms, HGAPSO, PSO, and GA, were run 
for both the models to obtain the minimum fitness. Figures 11.4 and 11.5 represent the 
convergence test for all the optimization tools. From the graphs, it has been concluded that 
HGAPSO gives a better convergence (both for the double diode model and single diode 
model) than the other two basic metaheuristic optimization techniques. 


11.5.1.5 Accuracy Test 


Accuracy test has been conducted to predict the simulated diode current with respect to 
the experimental current under the same experimental conditions [35, 65]. Here we use 
two different error indexes: relative error (RE) and individual absolute error (IAE), defined 
in Eqs. (11.18) and (11.19). 


IAE = ¡e = Deco] (11.18) 


a= GA 


—PSO 
0.02 N 


EZ - HGAPSO 


Convergence speed 


100 500 1000 2000 3000 4000 5000 
Number of iterations 


FIGURE 11.4 
Convergence speed for single diode modeling using different metaheuristics. 
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FIGURE 11.5 
Utilizing various metaheuristics, convergence speed for modeling double diodes. 


RE- Tmeasured B Tealculated a 1. 19) 


I measured 


Moreover, total absolute error (TAE) can be defined as 


TAE- Y TAE, (11.20) 


i-l 


Where n is the number trial where Imeasurea and lcalcutatea are the calculated and experimen- 
tal values of current. Tables 11.6 and 11.7 represent the optimum parameters of double 
diode and single diode PV cells by using three different optimization techniques. Tables 11.7 
and 11.8 represent the total absolute error and accuracy of single diode and double diode 
models by utilizing the optimization techniques. From the experimental results it is seen 
that HGAPSO obtained the least RMSE and TAE. 


11.5.2 Overall Efficiency 


Presently we summarize the presentation of HGAPSO metaheuristics advancement system 
dependent on the previously mentioned assessment foundations and contrast other proce- 
dures. Consequently, we allocated an exhibition score for every calculation for every one of 


TABLE 11.6 
Optimal Parameters for Single Diode Model 


Parameters PSO GA HGAPSO 
La 0.92 0.75 0.7233 
n 0.452 0.4585 0.7109 
R, 0.045 0.03985 0.00018 
Ra, 20.14 60.2308 10.33751 


n 1.8481 1.787 1.6222 
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TABLE 11.7 
Optimal Parameters for Double Diode Model 
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Parameters PSO GA HGAPSO 
T 0.73 0.7004 0.75978 
n 0.928 0.59 053565 
R, 0.0039 0.0297 0.03449 
Ra 85.32 70.938 100 

nı 1.7 1.582 1.53443 
Isa 0.000126 0.235 0.125 

m 1.8 1.67 2 

TABLE 11.8 


Comparative Study Based on Total Absolute Error 


Case Method Total Absolute Error 
RIC Single Diode GA 0.027807 
PSO 0.021422 
HGAPSO 0.01584 
RTC Double Diode GA 0.022134 
PSO 0.02431 
HGAPSO 0.01648 
TABLE 11.9 


Comparative Study Based on RMSE and Accuracy 


Case Method RMSE Accuracy 


RIC Single Diode GA 1.7861 98.214 
PSO 2.896 97.103 
HGAPSO 0.885 99.115 

RTC Double Diode GA 1.7689 98.2311 
PSO 2.646 97.354 
HGAPSO 0.926 99.074 


the paradigms. The assessment of this score is based on the ratio of situations (capacities) 
when a computation achieves the best result (rule) to all other examples, such as single 
and dual diode demonstrations of RTC solar cells. Table 11.9 displays the close evalua- 
tion based on these results to evaluate the suggested HGAPSO calculation's productivity 
levels. The accompanying table illustrates how the recommended HGAPSO fared against 
various metaheuristics under all scenarios (Figures 11.6 and 11.7, Table 11.10). 
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FIGURE 11.6 
Relative error versus number of experimental data in single diode model using different metaheuristics. 
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Relative error versus number of experimental data in double diode model using different metaheuristics. 


11.5.3 Validation Between Manufacturer's Datasheet and Experimental Datasheets 


In this section, to validate the proposed model, we extricate the optimum parameters for 
both two diode and single diode models of solar-based modules. The exploratory informa- 
tion is straightforward and helps to produce I-V characteristics which are identical to those 
given in the manufacturer datasheet at irradiance levels of 1000 W/m? and temperature 
of 25*C. 


11.5.3.1 Case Study 1: Single Diode Model 


The optimal parameters of the single diode model are achieved when it gives the least 
RMSE after the 20 runs. Data sets have been taken for the temperature of 25?C and 
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TABLE 11.10 


Comparative Study Based on Different Statistical Criteria 


Standard Max. Min. Mean Total 
Method Deviation Time Convergence TAE RE RMSE Accuracy RMSE RMSE RMSE Score 
GA 0.33 0.33 0.5 0 0.5 0.33 0.33 0.33 0.33 0.33 3.31 
PSO 0 0 0 0.33 0 0 0.33 0.33 0.33 0.33 1.65 
HGAPSO 0.66 0.66 0.5 0.66 0.5 0.66 0.33 0.33 0.33 0.33 4.96 
TABLE 11.11 


The Extracted Optimum Parameters for S75 PV Module at 25°C Temperature and Irradiance Level 
of 1000 W/m? by HGAPSO (Single Diode Model) 


Solar Module Temperature Ip Isa R, Ra n 
$75 25°C 4.27431 0.0771 0.2547 1939.26 1.481 
1 
0.8 
0.6 
$ 
S 0.4 == exp 
2 
$ 02 ——PSO 
E 
5 GA 
o 
a] 


1 2 3 4 5 6 7 8 9 101112131415 16 17 18 19 20 21 22 23 2 26 = —HGAPSO 


Number of experimental data 


FIGURE 11.8 
Comparative studies between experimental and calculated current for single diode model. 


irradiance levels of 1000 W/m‘. Ideality factor n for the ideal (indicated in literature) and 
the estimated single diode model in both cases lies in the interim [1, 2]. From Table 11.11, 
possible ranges of input parameters like series resistance fall inside the interim [0Q, 14]. 
The correlations between the assessed models and the test information of the single diode 
model at irradiance levels of 1000 W/m? are given in Figure 11.8. 


11.5.3.2 Case Study 2: Double Diode Model 


The optimal parameters of the two diode models are achieved when it gives the least RMSE 
after the 20 runs. Datasets have been taken for the irradiance levels of 1000 W /m? and tem- 
perature of 25*C. From Table 11.12, we obtained the possible ranges of input parameters, 
namely, that series resistance lies in [0 Q, 0.06 Q], parallel resistance is low (> 100 Q), and 
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TABLE 11.12 


The Extracted Optimum Parameters for S75 PV Module at 25°C Temperature and Irradiance Level of 
1000 W/m? by HGAPSO (Double Diode Model) 


Solar Module Temperature Ip Isa Isa R, Ra nı No 
$75 25°C 4.478 0.0000 0.0537 0.2843 7542.85 3.4522 1.1747 
1 
0.8 
0.6 
S 04 
© = exp 
Z 0.2 
2 ——PSO 
= 0 
3 GA 
5-02 12345 6 7 8 91011121314151617181920212223342526 
Le = HGAPSO 
—0.4 
—0.6 
—0.8 
Number of experiments 
FIGURE 11.9 


Comparative studies between experimental and calculated current double diode model. 


ideal values of n and ņ would be 1 and 2 instead of industrial sample values of 1 and 5, 
respectively. The examinations between the evaluated models and the test information of 
the two diode models at irradiance levels of 1000 W/m? are given in Figure 11.9. 


11.6 Conclusions 


This paper presents optimal characteristics of the exploratory solar cell datasets of single 
diode and double diode RTC cells for the control and design of a PV system. The estimated 
current, as well as power rating, of a solar cell depends upon several input parameters; 
those that are mentioned are produced current, shunt resistance, series resistance, reverse 
saturation current, and ideality factor in Section 11.2. Due to the complexity of the opti- 
mization problem, a new multiobjective hybrid optimization method involving both PSO 
and GA is applied. Moreover, a comparison between the HGAPSO on the one hand, and 
PSO and GA each used in isolation on the other, is performed for both the single and 
double diode objective functions. 

Due to its capability of searching for a global optimum and its convergence speed, the 
proposed HGAPSO algorithm is a fruitful algorithm that can be implemented as a supple- 
mentary method to assess the PV module simulation parameters. Section 11.5 detailed the 
proposed hybrid HGAPSO's efficiency against that of PSO and GA using convergence 
speed, computational efficiency, root mean square error, and accuracy. 
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A subsection of Section 11.5 explains the key study for verification of the effectiveness of 
HGAPSO with the experimental datasets and graphically. In this key study, the proposed 
hybrid algorithm is applied, precisely and proficiently retrieving the attributes of PV mod- 
ules (S75) which have irradiance levels 1000 W/m? and temperature of 25°C. 

Every segment of the results reveals that the suggested HGAPSO exhibitions surpassed 
the standard PSO and GA. However, a relatively large inaccuracy and standard deviation 
are among the major flaws of HGAPSO. Research should focus on enhancing the antici- 
pated optimization's dependability and consistency. Presenting unique statement frame- 
works and modifying or aggregating the effectiveness might be a strategy to overcome 
these constraints. 
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Bio Inspired Optimization Based PID Controller 
Tuning for a Non-Linear Cylindrical Tank System 


12.1 Introduction 


In process industries, most of the real-time processing units, like biochemical reactors, 
cylindrical tank systems, and continuous stirred tank reactors (CSTRs), are highly nonlin- 
ear in nature; that is why tuning the controller parameter for these systems for stability as 
well as eliminating the disturbance is critical. So proper tuning of the controller parameter 
is essential in all the nonlinear process control systems. 

Designing a controller parameter of a stable system is quite easy, but in a nonlinear pro- 
cess or unstable system, the designer should indicate the range (maximum and minimum) 
of controller parameters and the average of these limiting values as well to make the sys- 
tem stabilize. Increasing the time delay of the unstable system narrows the limiting value 
of the controller parameter to bring the system under control. 

Researchers have used conventional controllers for the model based system [1], and 
according to their requirement, system models were further reduced to first order or first 
order with time delay systems. But for the unstable system, the fitting rule could not pro- 
vide better results after reducing the system model. Most of the classical proportional + 
integral +differential (PID) tuning methods utilized computational techniques to get the 
best possible controller parameters. In recent years a number of heuristic algorithms have 
used a computational method for proper tuning of control parameters. 

Most of the process industry has a vital task to maintain the liquid level in an organized 
manner. In the tanks, fluid is handled chemically or by transfusion, but liquid level in the 
tanks must be under stable conditions [2, 3]. In this process, the tank is chosen as a cylindri- 
cal shape in which liquid level is to be controlled. Cylindrical tank systems have great 
application in process industries such as chemical and pharmaceutical industries, food 
processing industries, and so on. Change in shape with respect to height makes a conical 
tank a nonlinear system. So, control of fluid is a major task [4], and an adaptive and reliable 
technique must be used to make the system stable. 

There are a huge number of works done on PID tuning of a nondirect framework. S. M. 
Girirajkumar and D. Mercy et al. [5] have done their work tuning a PID controller utilizing 
a Z-N strategy and astute systems like the genetic algorithm. Lin et al. [6] have proposed 
receptive calculation for PID controllers dependent on a hypothesis of versatile collabora- 
tion. In Gole'a et al. [7], for nondirect frameworks where a multifarious law is gained by PI 
law, a fuzzy model reference flexible control has been proposed. In PID gain booking, 
Viljamaa and Koivo [8] have proposed a predictive argumentation framework. Marcelo 
et al. [9] have proposed a Lyapunov-based settling control structure technique for an 
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unsure nonstraight powerful framework utilizing as a fuzzy model. Chatterjee et al. [10] 
have proposed a near report between Z-N and fuzzy rationale based PID controllers for the 
speed restrictions of a DC engine. Pijush et al. [11, 12] proposed fuzzy logic based AI tech- 
niques for prediction of flow rate on contact types of flow sensor. Pijush et al. studied the 
performance of conventional AI techniques like ANN [13], ANFIS [14], and GA [15, 16] for 
process controller liquid level rate projection. 

In this chapter, tuning the controller parameter of a cylindrical tank is done by the flower 
pollination algorithm (FPA) and bacteria foraging optimization (BFO). A detailed study of 
the different measurement of error and transient properties of both the transfer functions 
(first-order system and first-order system with time delay) of a cylindrical tank are ana- 
lyzed by both the algorithms. 

The rest of the chapter is organized as follows. Section 12.2 exhibits the overall method- 
ology of the research, which elaborates a diagram displaying a tube shaped tank and a 
brief portrayal of FPA and BFO based controllers. Section 12.3 shows the simulated out- 
comes on BFO and FPA process models and a continuous execution utilizing a cylindrical 
tank framework. Segment 4 concludes the present research work. 


12.2 Methodology 


Intelligent control mechanism iterative methods determine the proportional, integral, and 
differential transformation functions for using the error and derivation erroneous inputs, 
and then uses these results to update the controller gains of PID controllers. Three distinct 
tweaking criteria—integrated absolute error (IAE) integral of square error (ISE), and inte- 
gral time absolute error (ITAE)—were used to evaluate which of the controllers' skill level 
was optimal. These defect integrals each reflect the extent and nature of an imperfection 
as a type of correction factor. Again the performance of PID controller depends on propor- 
tional gain (k,), integral gain (k;), and derivative gain (k,). These gains can be got by using 
the following methods: Ziegler-Nichols (ZN), gain-phase margin, root locus, minimum 
variance, and gain scheduling. However, some of these strategies are rather difficult, and 
they are not the best for manipulating unpredictable, high-order systems. Different search 
techniques are given to enhance the efficacy and get appropriate k,,k;, and k4. Numerous 
methods have been suggested in some research for assessing PID computational efficiency: 
ITAE, integral of timing-weighted-squared-error, and integrated absolute error (IAE) per- 
formance evaluation (ITSE). 

In this research, the overall process was performed in three stages as shown in Figure 12.1. 
In stage 1, the mathematical formulation of a cylindrical tank system is formulated by the 
inflow rate and outflow rate, which are described in Subsection 12.2.1. In stage 2, the objec- 
tive function is obtained with the help of PID controller transfer function along with the 
transfer function of the cylindrical tank. Finally two different bio-inspired optimization 
techniques are applied for finding the optimum value of proportional gain (k,), integral 
gain (k;), derivative gain (k4), and error indices. 


12.2.1 Mathematical Model of Cylindrical Tank 


For the field of pharmaceutical industries, a cylindrical tank is used due to laminar flow of 
liquid according to the flow regime (Reynolds number is less than 2000). The mathematical 
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FIGURE 12.1 
Structure of the proposed model. 
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FIGURE 12.2 
Schematic diagram of the conical tank system [17]. 


model of a cylindrical tank is designed by a differential equation utilized on the concept 
of resistance and capacitance. Figure 12.2 shows the schematic diagram of the cylindrical 
tank [17]. 

Operating parameters of cylindrical tank 


gi = small deviation of inflow rate from its steady-state value 
do = small deviation of outflow rate from its steady-state value 
H = steady-state head 

h = small deviation of head from its steady-state value 


If we assume that q; is the platform’s input and h is its output, the transmission design 
of the system is obtained by 


Qi(s) (1+RCS) Een 


If qo is taken as the output and the input is q; then the transfer function of the system is 
given as follows: 


LH 
do R 
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Taking the Laplace transform of both the sides, we get 


Q0(s) = Hts) (12.2) 
From Eqs (12.11) and (12.12), we obtain [17] 
Q0(s)__R (12.3) 


Where R = 0.2, C = 30.4 

After reckoning the t1 and #2, the time delay (= 0) and the process time constant (= T) can 
be obtained by the consecutive equation 0 = 1.311 — 0.29 t2 = 0.525, T = 0.67(t2-f1) = 6.1841 

Now the transfer function can be represented by the equivalent first order time delay 
form is [17] 


G(s)- (Ts+1) exp (-0 *s) 
G(s)= QI exp(-0.525 5) (12.4) 


12.2.2 Description of Metaheuristic Techniques 
12.2.2.1 Flower Pollination Algorithm (FPA) 


The flower pollination algorithm is a cosmos-obeying algorithm discovered by Xin-She 
Yangin 2012. In FPA, the pollination process occurs through some carriers like wind, insects, 
birds, bats, and so on. Flower pollination basically deals with pollen transference from a 
flower's male organ to its female part [18] via wind, water, or insects or other animals. The 
pollination process basically deals with the young plants' propensity to propagate. The 
pollination process can be of two types: biotic and abiotic processes. Biotic pollination pro- 
cesses include livings pollinators such as birds, insects, and so on to maneuver pollen from 
one flower to another. Abiotic pollination involves nonliving pollinators like wind, water, 
and so on to transfer pollen [19, 20]. It has been surveyed that 90% of the flowering plants 
deals with the biotic pollination, which needs pollinators for reproduction of plants, and 
about 1076 occurs without any pollinator leading to abiotic forms of pollination. There are 
two methods in pollination, namely self-pollination and cross-pollination. Self-pollination 
arises from pollen of the identical flower or various bloomings of the same plant without 
the aid of any pollinator, whereas cross-pollination uses pollen from a flower of a different 
strain [21]. Flower constancy is a process in which pollinators only visit certain types of 
flower plant species and increase the identical flower species reproduction. FPA relies on 
the following four rules for pollination [21]: 


1. Pollen-carrying carriers can accompany dispersion flights and can be seen as part 
of a global pollination process together including biological and cross-pollination. 


2. Local pollination comprises both biotic and self-pollination. 
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3. Flower interdependence is defined as the expectation of reproduction being equiv- 
alent to the similarity of two entwined blooms. 


4. A switch probability of p [0, 1] maintains both the local and global pollination 
situations. Local pollination may contribute significantly in the entire pollination 
process, in addition to physical intimacy and other factors like wind and water. In 
FPA, two fundamental plans, to be specific global fertilization and local fertilization 
processes, are there. Worldwide pollination includes the dusts that are moved by 
insects, birds, bats, and so on to an enormous separation because of they can fly 
longer range. In this procedure, there is likelihood to a get natural selection g*[best 
solution]. The worldwide fertilization can be composed numerically as 


Xi" - X/ -L(A)(X:- 8") (12.5) 


Here, X/ is the dust i or arrangement vector X; at emphasis f, g* is the present best 
arrangement among all arrangements in the current cycle and L(A) is a stage size. 
Since transporters might be moved over a bigger zone with various separation 
steps; L'evy flight [22] is utilized to express this wonder. For L> 0, L'evy appropria- 
tion can be composed as 


AT(A)sin(nA/2) 1 


T g^ 


L- 


(12.6) 


Here T (A) is the standard gamma capacity, and L'evy appropriation is valid for 
longer advances S > 0. Local fertilization is communicated scientifically utilizing 
Rule 2 and Rule 3 as 


X/" X! «e(X/ - Xy) (12.7) 


Here X‘ and X, are dust from discrete blossoms of the indistinguishable plant 
species. In the event that X/ and X,; results from the indistinguishable species or 
looked over the indistinguishable populace, this is practically like a neighborhood 
arbitrary walk if a diagram can be drawn from a uniform conveyance in [0, 1]. In 
modern research FPA is useful in different research domains [23-25]: process con- 
trol optimization, renewable energy, the chemical industry, biomedical field, and 
so on (Table 12.1). 


TABLE 12.1 
Parameter of FPA 


Population size 20 

Probability switch 0.8 

Number of iteration 20-100, interval of 20 
Number of the search variable 3 


Dimension of the search —25 to 25 
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12.2.2.2 Bacterial Foraging Optimization Algorithm (BFOA) 


BFOA is also a nature-obeying algorithm, developed by Kevin M. Passino in 2002. It is 
a mechanism based on biological encouragement that mimics the foraging activities of 
Escherichia coli (E. coli) bacteria resides in the human intestine. The objective of this algo- 
rithm is to eliminate weak foraging bacteria and keep the strong foraging bacteria to maxi- 
mize energy per time. The E. coli bacterium consists of a plasma membrane, a cell wall, 
and a capsule containing cytoplasm and nucleoid. The locomotion is done with the help 
of flagella. The bacteria can travel in two different ways, namely tumbling and swimming 
[26] using flagella. When flagella rotate in the clockwise direction, each flagellum pulls the 
cell and finally the bacterium tumbles. When flagella rotate in the anticlockwise direction, 
each flagellum pushes the cell to cause the bacterium to swim at a faster rate in search for 
food [27]. These two processes continue alternately to move the bacterium in search of 
nutrients at different directions. Bacteria can interact with each other by sending various 
signals. With the help of tumbling and swimming, the bacteria can travel a longer distance 
for higher concentrations of food and to avoid harmful places. There are four major steps 
in the BFO algorithm [28] as follows: 

Chemotaxis: Chemotaxis is the fundamental strides of the microscopic organisms’ 
scrounging procedure. The E. coli bacterium can go in two different ways, swimming and 
tumbling. In swimming, the bacterium swims a single way to gather nourishment, and in 
tumbling, the bacterium alters its course to another for nutrient gradient. Accepting 0'(j,kI) 
shows the i-th bacterium at j-th chemo strategy step, k-th proliferation step, and l-th dis- 
posal and dispersal step. The portability of the bacterium is spoken to as follows: 


0 (jk) -9 (rk ecli) i (12.8) 


Here, C(i) is the step size during each swim or tumble and A(i) represents a random 
vector whose elements lie in [1,1]. 

Swarming: A group of microscopic organisms helps organizes themselves into a revolv- 
ing ring by ascending the supplementary tilt. If the stress response is triggered by a higher 
level of short, they release an attractant suction that induces them to congregate and travel 
as concentric samples of swarming with a couple of layers of bacteria. The swarming pro- 
cess is spoken to scientifically as follows: 


Jec( (6, P( (j,k,l) 1) Dy: (0, e'( (j,k,1) l) = M attractant exp(- Wattractant 


P s P 
2: (On => 0j, y Dal Hrepellant 05) me Xa (6, m Om j (12.9) 


m-1 i=1 m=1 


Here, Jec (0,P(j, k,l)) is the objective function value to be joined to the original objective 
function. S is the total number of bacteria, p is the number of variables to be optimized and 
0 = [6,, 6,,.. 0, ]! is a point in the p-dimensional search domain. d w h 

W repellant ALE different coefficients that should be used properly || [ps]. 

Reproduction: After these two steps some bacteria have a good amount of food and 

some bacteria have less food. Those bacteria which have enough food will survive, and the 


attractant/ ""attractant/ '“repellant/ 
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TABLE 12.2 
Parameters of BFO 


S: Number of bacteria 19 
D: Number of parameters to be optimized 3 
NS: Swimming length after which tumbling of bacteria 3 
Nre: Maximum number of reproductions to be undertaken 6 
Ned: Maximum number of elimination-dispersal events 2 
Ped: Probability with which the elimination-dispersal will continue 0.25 


rest of them will die. In this step, healthier bacteria divided into two bacteria which are 
then moved in the sameness location. This tracks the size of the swarm fixed, and healthier 
bacteria keep reproducing. 

Elimination and Dispersal: When the temperature of the high nutrient gradient area 
suddenly increased, then all the bacteria in that area are destroyed, or a bunch is dispersed 
into a new area. This process relocates the bacteria to a new place to escape a noxious envi- 
ronment (Table 12.2). 


12.3 Results and Discussion 


ITAE = [reca (12.10) 
0 

IAE = f e(t)dt (12.11) 
0 

MSE- : [reco at (12.12) 


TABLE 12.3 
PID Tuning Parameters by BFO Algorithm for Ordinary Transfer Function 


Number of 
Function Iterations k, Ka k; 
T(s) = 0.23 20 7.7939 7.1118 4.3870 
7s+1 40 6.4854 2.2700 2.3294 
50 8.4119 5.7796 2.0940 
60 4.9076 3.9160 1.4720 
80 8.6776 5.8946 1.9139 


100 6.8421 7.3495 1.8635 
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TABLE 12.4 
Optimal Value and Elapsed Time of BFO for Ordinary Transfer Function 


Function Number of Iterations Optimal Best Fitness Value Elapsed Time 
T(s) = 029 20 49.7776 174.063899 
ATH 40 21.0245 351.513075 
50 22.2232 457.636032 
60 15.5019 382.295558 
80 22.2456 463.506272 
100 17.8902 594.390941 
TABLE 12.5 


PID Tuning Parameter by FPA Algorithm for Ordinary Transfer Function 


Function No of Iteration k, Ka k; 
T(s) = 22 20 17.9615 -11.3237 1.1328 
Tal 40 16.5188 -25.0000 1.0868 
50 9.3039 -25.0000 1.0097 
60 24.6658 -8.4462 1.1087 
80 17.2079 -25.0000 1.0179 
100 8.9698 -25.0000 1.2150 
TABLE 12.6 


PID Tuning Parameter by FPA Algorithm for Ordinary Transfer Function 


Function No of Iteration Optimal Best Fitness Value Elapsed Time 
fos 22 20 1.311 29.954760 
DE 40 0.56249 60.14014 

50 1.8654 74.006416 

60 1.2802 105.587651 

80 1.0123 127.323537 

100 2.6779 153.72954 
TABLE 12.7 


PID Tuning Parameters by FPA Algorithm for FOPD Transfer function 


Function Number of Iteration k, Ka k; 
T) = 9-28 ms 20 -13.1299 -1.0278 18.5636 
6.18s+1 40 22.1051 11.6880 -12.5917 
50 -20.9579 16.1479 19.6661 
60 -25.0000 -11.1245 -0.5083 
80 3.4650 -7.7788 -18.7215 


100 20.1734 0.0510 -1.0162 
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TABLE 12.8 
PID Tuning Parameters by FPA Algorithm for FOPD Transfer Function 
Function Number of Iterations Optimal Best Fitness Value Elapsed Time 
T(s) = 029 xs. 20 123.22 33.757265 
Ade 40 98.4234 108.490897 
50 88.3684 162.032953 
60 96.9638 143.382602 
80 89.2301 119.038884 
100 50.8714 148.041570 
TABLE 12.9 
PID Tuning Parameter by BFO Algorithm for FOPD Transfer Function 
Function No of Iteration k, Ka k; 
T(s) = 0-28 os. 20 30.5040 10.8062 42.2983 
ii 40 26.9336 27.3297 34.8356 
50 19.4059 17.4537 33.1396 
60 27.3052 13.2623 16.6599 
80 8.8490 21.0055 27.9986 
100 19.6958 27.7809 36.5638 
TABLE 12.10 


Optimal Value and Elapsed Time of BFO for first order plus dead time (FOPD) Transfer Function 


Function Number of Iterations Optimal Best Fitness Value Elapsed Time 
T) = 9-28 ums 20 21.2406 189.63421 
pod 40 21.0245 244.562542 
50 20.9767 292.31456 
60 23.4052 338.935617 
80 21.0680 494.977125 
100 20.9725 622.434979 
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TABLE 12.11 


Comparative Study for Time Domain Analysis 


Parameter BFO-Ordinary (100) FPA-Ordinary BFO-FOPD FPA-FOPD 
Rise Time 8.9030 23.1515 NaN 9.0639e-04 
Settling Time 26.0901 42.4131 NaN 0.1409 
Settling Min 3.8326 7.3878 NaN -2.2216 
Settling Max. 4.4680 8.2053 NaN 0.4844 
Overshoot 6.1719 0 NaN 128.0934 
Undershoot 0 15.1753 NaN 506.7408 
Peak 4.4680 8.2053 Inf 2.2216 
Peak Time 18.0380 109.4433 Inf 0.0177 
TABLE 12.12 


Comparative Study for Error Indices after 100 Iteration 


Name of the system ITAE IAE MSE 
BFO-Ordinary 93.249 5.17 26.728 
BFO-FODP inf inf inf 
FPA-Ordinary 109.44 14.17 200.78 
FPA-FODP 2.249 127.09 1.61*E04 
[ÀJ 


12.4 Conclusion 


Because of perplexing and dynamic procedures in the field of pharmaceuticals, they fre- 
quently have evil conditions because of an absence of capacity in the creation of the pro- 
cedure control framework and lack of control execution. This investigation intends to 
upgrade the exhibition of proportional + integral + differential (PID) control of the yield 
stream rate from numerical display and hunt for ideal focuses. Optimization of PID con- 
trol tuning parameters FPA and BFO to deal with nonlinear frameworks with evaporator 
undershoot reaction qualities that are hard to treat and improve the framework reaction 
with overshoot, where rise time is very long. Overall analysis is done for the different mod- 
els, the ordinary transfer model and the first order model with time delay. 

For PID tuning of the ordinary transfer function of a cylindrical tank, the flower pollina- 
tion algorithm is superior to the bacteria foraging in terms of optimum value of the objec- 
tive function (Figure 12.3) and elapsed time, but some of the transient response of the 
system, like rise time and settling time, is comparatively trailing the bacteria foraging opti- 
mization. For PID tuning of the first order time delay transfer function of a cylindrical tank, 
the flower pollination algorithm is superior in terms of elapsed time and transient param- 
eters while bacteria foraging has less optimum value in terms of the objective activity (in 
Figure 12.4). Therefore, FPA is better than BFO in terms of ITAE and MSE for PID tuning of 
the transfer function of cylindrical tank. 
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FIGURE 12.3 
Optimal value of BFO and FPA for ordinary transfer function. 


60 
No of iteration 


70 


100 


SUUNI, 19110340) (ld pasvg uogvzuundo pasdsuy oig 


LOC 


208 


Artificial Intelligence for Cognitive Modeling 


140 


120 


100 


40 


20 30 40 50 60 70 80 90 100 
No of iteration 


FIGURE 12.4 
Optimal values of BFO and FPA for FOPD Transfer Function. 
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A Hybrid Algorithm Based on CSO and PSO for 
Parametric Optimization of Liquid Flow Model 


13.1 Introduction 


Over the past decades, optimization has played a key role in the process industry, with arti- 
ficial intelligence techniques assuming a leading role in development of the process model 
[1]. The modern process industry uses AI to control the liquid flow rate, liquid level, mixing 
of chemical products, and other many possible applications. Generally, the optimization 
technique used to find the optimal response corresponds to a given set of input variables 
on any nonlinear model [2]. The liquid flow control process is one of most common nonlin- 
ear models, where the response variable of flow rate relies upon a number of influencing 
attributes like type of sensor, characteristics of liquid, surrounding temperature, and so on. 
Due to complex relations between multivariable inputs and response, and a large delay 
time, this process model has limitations for conventional optimization techniques [3]. 

In most of the nonlinear complex process models, researchers have adopted computa- 
tional optimization as it takes less computational time and predicts the potential input 
attributes correctly [4, 5]. Performance of computational optimization is generally seg- 
mented into two parts. During the first phase, a model was designed with the help of a 
majority number of datasets containing input and output variables. In the second phase, 
the model was validated and tested against a test dataset which accommodates the input 
and output process variables [6]. 

A number of researches have been carried out by many AI techniques for the parametric 
optimization and design of the gray box model on the basis of liquid flow control process; 
some these are the neural network control model [7, 8], fuzzy logic controller [9, 10], genetic 
algorithm [11-13], hybrid GA-ANN model [14, 15], ANFIS model [16, 17], and so on. 
Empirical models like analysis of variance (ANOVA), response surface methodology 
(RSM), and ANN are used to represent any nonlinear process plant training dataset, and 
finally metaheuristic optimization is used to find the optimal process parameters so that 
the test model is best fitted with the experimental dataset. There have been several 
researches performed for parametric optimization of empirical models by different meta- 
heuristic optimization techniques for the liquid flow process industry; some of them are 
FPA-ANN [18], FPA-ANOVA [19], FPA-RSM [20], and improved versions of the original 
elephant swarm water search algorithm (ESWSA) based ANOVA and RSM model [21]. In 
addition to these, to improve convergence speed, a hybrid particle swarm optimization 
and grey wolf optimization (HPSOGWO) has also been proposed to implement this com- 
plex model [22]. The entire AI model operated within the constraint boundary conditions 
and fluctuation of complex features. The outcomes might yet be improved, though. 


DOI: 10.1201/9781003216001-15 211 


212 Artificial Intelligence for Cognitive Modeling 


Therefore, we still have an issue with estimation of a super effective model for defining a 
flow rate control process. Particle swarm optimization is a very fruitful algorithm for find- 
ing the solution of any nonlinear objective function with the main aim to reduce the com- 
putational time and transfer time, and it also depicts a better solution and runs faster than 
other metaheuristic algorithms. However, its solution may become trapped in local optima, 
and due to fast convergence its results may not be accurate. On the other hand, the genetic 
algorithm provides the global optimum solution, but it requires a significant amount of 
computational time as it provides the better result by increasing the number of iterations 
as well as number of computational steps. Hence in this research we proposed a hybrid 
optimization algorithm, HGAPSO, by taking the advantage of both PSO and GA. 

Comparatively to other strategies with the same objectives, the HGAPSO technique is 
projected to function faster with varied sizes of workflow applications [23-26]. Furthermore, 
because the GA mutation operator is used to improve the accuracy of the solutions found 
for many complicated and nonlinear problems, the HGAPSO algorithm may not get caught 
in the local optimal solution. 

Itis possible to draw the conclusion that the optimization tactics are effective based on 
the simulation's results that have been provided being both efficient and practical for 
achieving the real influence needs of the liquid flow monitoring procedure. The remainder 
of this essay is arranged as follows: After an orientation, Section 13.3 provides a quick 
emergence with the quantitative explanation of modeling of the liquid flow control men- 
tioned in Section 13.2. The intended approach is explained in Section 13.4; the findings and 
discourse are then expressed; and finally, implications are provided in Sections 13.5 and 
13.6, respectively. 


13.2 Experimental Setup Liquid Flow Control Process 


The flow and level measurement and control setup [10] (Figure 13.1, model number WFT- 
20-I) can be used to perform the research. In all, it provided 134 random samples from 
which four hypotheses—pipe diameter, output voltage, viscosity, and liquid (water) con- 
ductivity—have been factored into the equation for this investigation. 


Liquid flow 


Liquid flow 


Flow sensor 


Rotameter 


domus 


Liquid flow 


Liquid flow 


FIGURE 13.1 
Experimental setup for liquid flow rate measurement [10]. 
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13.3 Modeling of the Liquid Flow Process 


It is particularly complicated to ascertain the procedure input factors, such as change in 
liquid properties and pipe diameter, to accomplish the best possible liquid flow rate and 
level using traditional controller approaches because of the nonlinear features of the liquid 
flow rate and liquid level processing. In order to optimize complicated mathematical mod- 
els like ANOVA [11, 27] and RSM [28], computational intelligence capabilities are therefore 
essential (Figure 13.2). 

In order to accurately characterize the quantitative link between the responsiveness 
in variables and response of the liquid flow control process, we have employed the 


Initialize the number of nest, iteration, 
dimension of the problem, target fitness value, 
& other constants like P4, Cy, C2, W, & V max 


zc a 


Generate the initial nest to find the current best 
solution tk, 


Construct the cuckoo search by using Levy's 
algorithm 


Evaluate the fitness & searching for best nest 


Each cuckoo updates the position & velocity of 
the PSO 


Find the best fitness value from Gbest 


Termination 


criteria? 


zo e lee 


Obtain the optimum solution 
FIGURE 13.2 


Flowchart of the proposed hybrid HCSPSO model. 
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well-known varying power equations ANOVA [11] and [12]. Following are some examples 
of how flow rate (F) in the mathematical model may be described in terms of the sensor 
output (E), pipe: 


F = LE? DO pes quo (13.1) 


Where pu, H2, H3, H4, and us are the quantitative model's parameters. Now, methods of 
cognitive computing are employed to extract the optimal values of this coefficient from the 
empirical observations dataset. As a computational intelligence tool, three distinct meta- 
heuristic optimization strategies can be utilized in this case to lessen the divergence 
between measured and simulated flow rates as well as to attain the process variable's exact 
efficiency and approach conformity with the observed one. RMSE was therefore employed 
in this study as an optimization problem for the metaheuristic that required to be whittled 
down. 


Y. (ED xy 
N 


RMSE (X) = | (13.2) 


Where X is the collection of predicted values, and N is the total quantity of experimental 
data. 

When modeling ANOVA data, the erroneous function (E;, Dj, ki, ni, X) and set of param- 
eters X can be expressed as 


(Ei, Di, X) = qm. E"? D? 4 n" — F (13.3) 


X= [un Ha, Ma, Ma, H5} (13.4) 


Where Fis the experimental data. 


13.4 Proposed Methodology 
13.4.1 Hybrid GAPSO 


This section discusses a hybrid algorithm that combines the CSO and PSO strategies. The 
schematic in Figure 13.4 provides the critical steps of the hybrid CSPSO method. PSO is 
one of the most effective methodologies; however, because of how quickly it converges, it 
typically converges abruptly in complicated issues [29]. Numerous complicated problems 
are solved using the CSO method, which surpassed competing algorithms [30]. The CSO 
method could converge a little bit more slowly, but it has higher probing potential. No 
method is capable of solving all optimization issues adequately. The global optimal cause 
can be obtained by combining the existing techniques to handle this problem [31]. The 
hybrid method can enhance the computational efficiency and global optimization potential. 
It could make it less likely toward becoming locked in local minima. The greatest elements 
of both algorithms’ characteristics may be combined in the hybrid algorithm. 
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13.4.2 Parameters Setting 


Every calculation's configuration parameters in the test are shown as follows: 


1. For CSO, extra range factor for crossover (y) is 0.4, crossover percentage (pc) is 0.7, 
mutation rate (mu) is 0.1, and mutation percentage (pm) is 0.3 as indicated by the 
prior work [12, 32]. 


2. For PSO, the inertia weight damping factor (Wamp) is 0.99, inertia weight (w) is 
1, global learning coefficient (C5) is 2, and personal learning coefficient (C) is 1.5, 
according to the earlier work [33]. 


3. For HCSPSO, mutation rate is 0.05, and global learning coefficient (C;) is 2, r1 and 
72 are (0, 1), and degree of importance: a1 and a2 are (0.4, 0.4), and a3 is 0.4 [34]. 


We specify 5000 maximum iterations and 100 maximum populations for each method. 
In order to examine optimum estimates for a liquid flow model, the search space is con- 
strained to a 5-dimensional function optimization problem of [u,, Mo, Hz, Hy Ms) as already 
shown in Equation (13.4). The search range [7] for the optimization of a liquid flow-based 
model is (-15, 15). 

Figure 13.3 shows the overall process to perform the present research. Initially, the total 
134 experimental dataset was segmented into two parts: 117 data for training the model 
and 17 data for testing purposes. ANOVA is used as a nonlinear model to represent the 
present complex flow process. After obtaining the coefficient of ANOVA, testing the data- 
set by using the proposed three algorithms PSO, GA, and HGAPSO to get actual coefficient 
and optimum objective functions. Finally, the best fitted model is identified on the basis of 
statistical parametric criteria. 


134 experimental data with flow rate, pipe 
diameter, and liquid properties 


^ SS 


90% data used for train 10% data used for test 
the model the model 


| 


Find out the coefficient of 
nonlinear model of ANOVA 


| | 


Test the model using proposed algorithm and ranges offered by ANOVA to get 
actual coefficient of the model and optimize the objective function 


| 


Find out the optimal condition of the model 


| 


Calculate other statistical parameters to 
find out the best model 


FIGURE 13.3 
Flowchart of the overall process. 
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13.5 Performance Analysis 


For the modeling of solar cells, we used Matlab 2013b version, and the specification of the PC 
is Intel(R) Core (TM) i3processor, 4 GB RAM with Windows 7 operating system. During these 
numerical experiments, we have put a strain on and thought about the proficiency of the 
considered calculation based on certain models such the fitness test, computational efficiency 
test, reliability test, convergence test, and accuracy test, which are portrayed in the following 
subsection separately [35]. Toward the end, overall exhibitions have been described. 


13.5.1 Computational Efficiency Test 


One of the key factors for assessing the efficacy of the bio-inspired optimization approach 
used in a given method of parametric optimization is computational time. For a specified 
number of runs of the program (5000 iterations, 100 population changes, and 10 executions 
of the program), we have calculated the average number of iterations for each method with 
each of the issues in this subsection. A study of comparability based on average execution 
time is presented in Table 13.1. 

Table 13.1 shows that the HCSPSO model has the best average computationally expen- 
sive performance. 


13.5.2 Convergence Speed 


The conclusive outcome correlation cannot depict the searching performance of any optimi- 
zation technique. An optimization technique is said to have better convergence speed when 
it provides the least RMSE for several runs with the variation of iteration number [36, 37]. 
Here we performed the convergence test by taking the iteration 100 to 5000 and each of the 
cases we run 20 times. All the algorithms, HCSPSO, PSO, and CSO, were run for both the 
models to obtain the minimum fitness. Figure 13.4 represents the convergence test for all 
the optimization tools. From the graph, it has been concluded that HCSPSO gives a better 
convergence than the other two basic metaheuristic optimization techniques. 


13.5.3 Accuracy Test 


An ANOVA method known as the accuracy test may be used to determine how closely the 
computed value matches the corresponding values under existing experimental settings. 
To quantify the discrepancy between theoretical and simulation current data, denoted 
as Eqs. (5) and (6), in the accuracy test, we employed two error indicator indexes: mean 
absolute percentage error (MAPE) and mean absolute error (MAE). 


IAE = LE scared = Foatcutatea| (13.5) 


TABLE 13.1 


Comparative Study Based on Computational Time 


Method Average Computational Time (Sec) 
CSO 159.174 
PSO 119.0112 


HCSPSO 21.197 
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FIGURE 13.4 
Comparative study based on convergence speed. 


MAPE = 1 DA a (13.6) 


n= Fmeasured 


Moreover, MAE can be defined as: 


" TAE, 
MAE = =" — (13.7) 
n 
Feasured ANA Feaiculated are the experimental and estimated values of liquid flow rate, respec- 
tively, where n is the number of experimental datasets (Table 13.2). The best optimization algo- 
rithm consistently delivered runs with the lowest RMSE. The PSO, GA, and HCSPSO Matlab 
code presented in Table 13.3 is used to calculate the coefficient of the nonlinear models. 
The prediction error can be calculated using RMSE which can be defined as follows: 


2 
n Xex -X al 
RMSE = |- È| p | + 100% (13.8) 
Xexp 

TABLE 13.2 


Estimated Optimal Parameters by Using PSO, CSO, and HCSPSO Based Modeling of Liquid Flow 
Control Process 


Method Py Py Its Ha Hs 

PSO 15.00 10.0466 —1.0550 —3.6296 —0.6247 
CSO 15.00 8.4074 —0.7271 —1.3846 —0.7300 
HCSPSO 9.4281 8.3666 -1.2252 1.7282 -1.1319 
TABLE 13.3 


Comparative Study Based on Mean Absolute Error (MAE) in PSO, GA, and HGAPSO 


Method Mean Absolute Percentage Error (MAPE) Mean Absolute Error (MAE) 
PSO 16.60 0.036 
CSO 18.41 0.039 


HGAPSO 2.84 0.012 
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Accuracy = (100 RMSE)% (13.9) 


Where Xexp is the experimental value, Xa, is the calculated value, and m is the number of 
training data. Table 13.4 describes CSO offering maximum mean absolute percentage error 
(MAPE) and mean absolute error (MAE). It has been also observed from Table 13.5 that 
HCSPSO optimization has the least RMSE error and the maximum accuracy while there is 
no significant difference between CSO and PSO. Figure 13.5 shows the performance metric 
comparison (computational time and MAPE) between CSO, PSO, and HCSPSO. Figure 13.6 


TABLE 13.4 
Comparative Study Based on Root Mean Square Error 
(RMSE) and Accuracy 
Method RMSE Accuracy 
PSO 0.0442 99.9558 
CSO 0.0492 99.9508 
HCSPSO 0.0212 99.9788 
180 
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FIGURE 13.5 
Performance metric comparison for PSO, CSO, and HCSPSO. 
0.06 
0.05 
0.04 
0.03 HI MAE 
mm RMSE 
0.02 4 
0.01 4 
0 E 


CSO 


HCSPSO 


FIGURE 13.6 
Comparison characteristics about MAE and RMSE. 
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shows the comparison characteristics for MAE and RMSE for all the applied algorithms in 
this present research. For PSO, CSO, and HCSPSO based modeling, Figure 13.7 depicts the 
relative errors in relation to various liquid flow rate measurement examples. Compared to 
previous optimization strategies, the suggested DE optimization has the lowest relative 
error, as can be observed. 

Comparison of empirical and computed values of the outcomes are displayed in 
Figure 13.8 from the CSO, PSO, and HCSPSO. In comparison to practical flow rate, CSO 
optimization offers improved estimated flow rate. Figure 13.9 depicts the deviance graph 

_ Xexp E Xca 
Xexp 
optimal for the entire applied algorithm during very low liquid flow, but during high flow 
rate, deviation is maximum for CSO but the deviation of PSO and HCSPSO is nearly 
saturated. 


and experimental flow rate. From the graph it is seen that deviation is 
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FIGURE 13.7 
Relative errors for CSO, PSO, and HCSPSO based modeling of liquid flow control process. 
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FIGURE 13.8 
Comparisons of the characteristics of the experimental data and estimated liquid flow rate using CSO, PSO, and 
HCSPSO based model. 
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FIGURE 13.9 
Deviation versus Experimental Flow Rate. 


13.6 Finding Optimal Condition for Liquid Flow 


The ideal pipe diameter, liquid viscosity, liquid conductivity, and sensor output voltage 
are searched for in the next stage of this operation in order to determine the optimum 
conditions for proper grinding at that moment. Now, a new fitness or objective function 
(OF) is defined as follows, where liquid flow rate only depends upon the potential 
attributes' numerical values, not their unit. 


OF =F = u1.E D? n4 qus 
OF sl = 9.4281.E9366 [)-12222 K17282 711319 (13.10) 


In the previous section, we already get that among the applied three metaheuristic algo- 
rithms, HCSPSO is the best fitted model, which predicts the test dataset with a higher 
degree of accuracy. So in this section we again implemented the generated ANOVA models 
for liquid flow rate, which are minimized using HCSPSO. As starting solutions or demo- 
graphics, a set of random values for E, D, K, and n are utilized in this situation. Therefore, 
HCSPSO must optimize these four parameters in order for the value of OF to be as low as 
possible. These parameters’ exploration ranges are chosen to align with the control factors. 

After all iterations, the best conditions are discovered to be 0.218, 0.025, 0.605, and 0.898, 
respectively, for pipe diameter, sensor output voltage, viscosity, and liquid conductivity. 
This signifies that a suitable flow rate condition will occur when sensor output and con- 
ductivity is minimum, viscosity is maximum, and pipe diameter is moderate. 


13.7 Conclusions 


The research challenge of modeling and optimization of process parameters in any 
nonlinear process industry is intriguing. In the present research, we applied optimization 
techniques for the nonlinear liquid flow process model. From the experiment we obtain a 
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total of 134 datasets among which 117 training datasets were used for designing the model 
while 17 test datasets were used for validation of the model. Every dataset contains the 
potential input variables—liquid parameters, sensor output, and pipe diameter—and the 
output variable of flow rate. The relationships between these input and output variables 
are nonlinear, so determining the ideal process parameters is genuinely challenging using 
conventional computing techniques. 

Overall, research was performed into three stages: In the first phase, 117 datasets were 
used to design a nonlinear power model by ANOVA. In the second phase, two ordinary 
common metaheuristic optimizations, CSO and PSO, along with the improved hybrid 
optimization technique HCSPSO, were applied to test the model on 17 test datasets to find 
the coefficient of nonlinear model ANOVA. To identify the best fitted model, we used three 
statistical parameters—MAE, MAPE, and RMSE—and computational time. In the last 
phase we applied the HCSPSO algorithm for identifying the ideal values of pipe diameter, 
liquid viscosity, liquid conductivity, and sensor output voltage so that when liquid flow 
rate is reduced, the best conditions for appropriate grinding may be discovered. 

Sections 13.5 and 13.6 are mainly presenting the statistical analysis of the proposed algo- 
rithm. The proposed hybrid algorithm shows better performance by means of convergence 
speed, computation time, and statistical error, which are the main priorities of the present 
research. Moreover, accuracy of the entire proposed algorithm is nearly the same and is 
satisfactory. 

Improvement of convergence speed and computational time for modeling of any com- 
plex nonlinear model using optimization s is always an open challenge for near future. 
Internet of Things (IoT)-based AI models is another future scope for modeling of any com- 
plex system. 
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14 


Modeling of Improved Deep Learning 
Algorithm for Detection of Type 2 Diabetes 


14.1 Introduction 


Diabetes is a kind of disease which causes high blood sugar or glucose levels in a human 
body. It has three general types: in Type 1, the human body can't produce enough insu- 
lin; in Type 2, the human body can't create insulin well; and in Type 3, gestational diabe- 
tes occurs during pregnancy [1]. There are several health issues like eyes issues, kidney 
problems, heart problems, and strokes that occur due to diabetes. Due to this metabolic 
disorder, the human body neither produces or nor stores glucose for energy [2]. Hence in 
medical diagnosis, proper treatment of diabetes is now one of the important challenges. 
The number of active patients with diabetes is exponentially increased as reported by the 
World Health Organization (WHO). To predict the disease, different Ais have been pro- 
posed in the field of medical problems. 

In artificial intelligence (AI), deep learning is one of the subparts that can self-gain from 
the information [3]. It is likewise fit for unaided learning by which it can get familiar with 
a lot of unstructured and unleveled information that even a human brain can require a 
long time to comprehend. Deep learning utilizes different layers like ANN to extricate the 
potential attributes from raw data [4]. In the previous few years, various methods have 
been presented; the assortment of procedures include ANN approaches and deep learning 
deal with analyze to diagnose diabetes. 

There have been several studies performed for the diagnosis of diabetes using different 
feature-based deep learning algorithms; some of them are highlighted in this section. Novel 
deep learning (DL) approaches were studied [5] on simulated continuous glucose monitor- 
ing (CGM) signals for diagnosis of Type 2 diabetes. Among all the algorithms, CNN per- 
forms best with an average accuracy of 77.5%. A hybrid deep learning based restricted 
Boltzmann machine approach [6] was used to classify the states of a diabetes patient. 
Accuracy of the proposed model was about 92.10%. A deep neural network (DNN) was 
examined to predict whether a patient may have any chance of developing diabetes within 
five years, with the help of eight sample characteristics [7]. A number of machine learning 
and deep learning approaches, namely CNN, VGG-16, and VGG-19, were applied for auto- 
mated classification of diabetes of retinopathy [8] by analysis of fundus images with vary- 
ing illumination and fields of view. Maximum accuracy obtained by the proposed model 
was about 82%. A comparative study was performed using a 5-fold and 10-fold cross vali- 
dation deep neural network [9] for the diagnosis of diabetes. From result analysis, it was 
observed that the maximum accuracy obtained by 5-fold cross validation was about 98.35%. 
A hybrid deep learning algorithm was modeled by a variational autoencoder (VAE), sparse 
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autoencoder (SAE), and convolutional neural network (CNN) for predicting diabetes. It 
was observed that the maximum accuracy obtained by the CNN classifier joined with SAE 
was 92.31% [10]. A comparative study was performed with three different deep learning 
algorithms: CNN, long short term memory (LSTM), and hybrid CNN-LSTM over a Pima 
Indian diabetes dataset (PIDD). Experimental results show that the proposed hybrid model 
CNN-LSTM predicts the classified model with highest accuracy of 91.38% [11]. A DNN 
was incorporated with a restricted Boltzmann machine (RBM) to analyze the Type 1 diabe- 
tes mellitus datasets after feature selection algorithm. Maximum accuracy of the proposed 
model is about 78% [12]. A comparison between CNN and hybrid CNN-LSTM was per- 
formed for automatically diagnosing diabetes with the help of heart rate variability (HRV) 
signals taken from ECG. For 5-fold cross validations, hybrid CNN-LSTM gave the maxi- 
mum accuracy of about 95.1%, while CNN gave an accuracy of 93.6% [13]. An e-nose is 
designed for detecting the three different classes of diabetes (healthy, prediabetes, diabe- 
tes) based on the patient health data. In the final stage, an optimized DNN model was 
implemented for the classification of the data. The proposed systems successfully detect 
the multilevel diabetes with the accuracy of 96.25% [14]. A methodology for prediction of 
diabetes using machine learning and deep learning approaches was applied on the PIDD 
dataset. Among machine learning and deep learning approaches, deep learning achieved 
the maximum accuracy, about 98.0776 [15]. A stacked auto-encoder based deep learning for 
the classification of Type 2 PIDD datasets contained 768 datasets with 8 attributes. The 
proposed algorithm achieved maximum accuracy of about 86.26% [16]. 

This chapter is organized as follows. Section 14.1 contains the introduction followed by 
some state-of-the-art techniques utilized in PIDD. Section 14.2, Methodology, contains the 
following subsections: description of datasets, imbalanced nature of the datasets, and how 
we convert this imbalanced dataset into a balanced one by using SMOTE techniques. In 
Sections 14.3 and 144, overall research flowcharts and basic information of the DNN are 
explained. Section 14.5 explains the result analysis, followed by the conclusion in Section 14.6. 


14.2 Methodology 
14.2.1 Datasets 


Present research is performed on the basis of a Pima Indian diabetes dataset, which accom- 
modates some potential attributes of the diabetes patient or the symptoms before they feel 
sick. The whole entity contained only female patients, none of whom have an age less than 
21 years. The dataset contained a total of 768 cases; among them, 500 samples are non- 
diabetic and 268 samples are diabetic. Potential attributes of the Pima datasets are insulin 
level, age, blood pressure, skin thickness, glucose, pregnancy, and diabetes pedigree func- 
tion, as shown in Table 14.1. Figure 14.1 show the box plot of each potential attribute's 
distribution and their variability. The PIDD is taken from the website https:/ / data.world / 
data-society / pima-indians-diabetes-database. 

Before applying optimization techniques, a number of potential attributes characteristic 
of the dataset should be identified [17]. For this purpose, the researchers applied a hyper 
parameters technique to identify the potential attributes. But for imbalance characteristics 
data, here we used the SMOTE algorithm, which is described in the following section. 
Cross validation is the process where experimental training output is compared with the 
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TABLE 14.1 
Potential Attributes of the Dataset 
Blood Skin Diabetes Pedigree 
Pregnancies Glucose Pressure Thickness Insulin BMI Function Age Outcome 
0 6 148 72 35 0 33.6 0.627 50 1 
1 1 85 66 28 0 26.6 0.351 31 0 
2 8 183 64 0 0 23.3 0.672 32 1 
3 1 89 66 23 94 28.1 0.167 21 0 
4 0 137 40 35 168 43.1 2.288 33 1 
Overview Data Set 
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BloodPressure - * * mo ot 
SkinThickness- e TED D 
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FIGURE 14.1 


Box plots of all potential attributes. 
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calculated dataset output to check whether the present model is fitted best or not; after 
that, accuracy or other statistical metrics are calculated by applying test dataset in the fit- 
test model. 


14.2.2 Imbalanced Datasets 


From the PIDD dataset it is seen that out of 768 cases, positive (1) and negative (0) instances 
are 268 and 500, respectively. Within the dataset, there is an unbalanced distribution of 
classes that causes decreased classification model accuracy via unequal distribution. The 
fundamental reason for this is that most machine learning models are incapable of learning 
both positive and negative patterns. Furthermore, in this dataset, the number of positive 
class is less than the number of negative class, so the overall dataset is imbalanced, so that 
this dataset never predicts the disease with perfect accuracy. The majority of studies in the 
literature do not account for minority class contributions to overall classification outcomes. 
The uneven nature of the presented dataset is efficiently managed by SMOTE, which is one 
of the proposed work's significant contributions. For finding the accuracy of the model, the 
contribution of each dataset, majority and minority class, are recorded individually. 


14.2.3 Synthetic Minority Over-Sampling Technique (SMOTE) 


SMOTE is a well-known method for imbalance datasets [18]. For the imbalanced dataset, 
it was used to build a classifier. Uneven distribution of underlying output classes makes 
up an imbalanced dataset. SMOTE is widely utilized in the classification of datasets with 
imbalances [19]. SMOTE is one of the efficient techniques for dealing with an imbalanced 
dataset [20]. SMOTE does the interpolation with the minority class samples. This aids 
in the classification of generalizations. Minority classes are frequently over-sampled in 
SMOTE by creating false examples [21]. When dealing with unbalanced datasets, SMOTE 
is a particularly successful approach. 


14.3 Proposed Flow Diagram 


Figure 14.2 depicts the suggested experimental flow, in which the datasets are initially 
processed to eliminate null values and cleaned in preprocessing. Then SMOTE is used on 


SMOTE for reducing 
oversampling 


Type 2 diabetes Data preprocessing 
datasets 


JL 


Training and testing 


Classification model Deep neural network NM 
dataset division 


FIGURE 14.2 
Flowcharts for the proposed model. 
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the given dataset to get an equal number of positive and negative samples. Furthermore, 
several machine learning and deep learning techniques are applied to the given dataset, 
and appropriate results are obtained at the end stage. For the confirmation of the results” 
credibility, K-Fold validation is conducted. 


14.4 Deep Neural Network for Data Classification 


In this research, an improved DNN-based framework is presented for diabetes data clas- 
sification to improve the evaluation metrics of the classification problem, inspired by the 
intriguing properties of deep networks. The improved DNN classifier was designed by the 
combination of DNN and SMOTE; the model used is "Sequential"; the activation function 
for the DNN is chosen as “Relu” to train the inner neural network; and for the final output, 
the activation function used is "Softmax." 

The sequential model put within the DNN is made up of two layers. There are two hid- 
den layers in the network, each with 20 neurons. During the classification problem, the 
hidden layer contained a Softmax layer. The output layer will deliver the diabetic and 
nondiabetic class probabilities for a given record. Table 14.2 lists the parameters that were 
utilized to simulate the model. 


14.5 Experimental Result Analysis 


The needed machine learning algorithms are executed on the machine, which is powered 
by an Intel Core i3 6th generation processor, due to the modest dataset size. It contains 
8 GB of RAM and a 500 GB hard drive. For diabetes data classification, a deep learning 
model with a Softmax layer was used. The input layer is made up of eight neurons, which 
correspond to the PIDD dataset's eight properties. The network is made up of two lay- 
ers, each with 20 neurons that extract the data's interesting properties. To train the model 
and categorize the data, the Softmax layer employs a scaled conjugate gradient technique. 
Fine-tuning the model's parameters boosts performance by allowing the model to learn 
and extract information. 


TABLE 14.2 


Parameters Setting for Deep Neural Network 


Parameters Values 
L, weight regularization 0.01 
Sparsity regularization 5 
Sparsity proportion 0.05 
Maximum epoch 1000 
Learning rate 0.01 


Loss function Cross entropy 
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14.5.1 Performance Measure 


In the machine learning algorithm, one of the major problems is that without knowing 
the nature of the dataset, it generates the same type of evaluation metrics, and the best 
machine learning technique is identified on the basis of maximum accuracy. Machine 
learning techniques provide irregular accuracy and other metrics during work with an 
unbalanced dataset. The proposed study was evaluated using precision, recall, the F1 mea- 
sure, and the receiver operating characteristics (ROC) curve [22]. Mathew's correlation 
coefficient (MCC) [23] is a significant parameter to judge the consistency of the model, 
whereas the kappa statistic is a significant metric to judge the quality of binary classifica- 
tions [24]. Outcome from the present model compares with other state-of-the-art methods. 
In any model, the value of kappa (k) reaching unity means the model is best fitted with the 
experimental results; otherwise model is flawed (Tables 14.3-14.5). 


14.5.2 Comparison with Existing System 


The proposed work's results are compared to the results of other state-of-the-art existing 
systems in order to verify the proposed work's reliability. 

Performance of different deep learning algorithms is shown in Table 14.6. Every deep 
learning algorithm has some drawbacks for the prediction of diabetes. Several researches 
have been done for training and testing of the PIDD dataset. Figure 14.3 shows the best 
deep learning algorithm, which gave the highest accuracies for the prediction of the PIDD 
dataset. Among them, the 5-fold deep neural network and hybrid CNN-LSTM offered the 
maximum accuracy of about 98.35% and 95.1%, respectively. But the proposed algorithm 
outperformed of all the state-of-the-art deep algorithms by means of highest accuracy, 
MCC, and K value. 


TABLE 14.3 


Average Performance Measure after Experimentation 


Avg. Avg. Avg. Avg. F1 Computation 
Algorithm Accuracy (%) Precision (%) Recall (%) ^ Score (%) Time (sec) AUC (%) 
SMOTE based DL 98.60 98.75 96.2 97.40 11.25 98.5 


TABLE 14.4 


Statistical Measure after Experimentation 


Algorithm MCC K value 
SMOTE based DL 0.9288 0.9815 
TABLE 14.5 


Average Performance Measure for Positive and Negative Class after Experimentation 


Precision Recall F1 Score 
Positive Negative Positive Negative Positive Negative 
Algorithm Class Class Class Class Class Class 


SMOTE based DL 98.5 99 94.84 97.56 96.4 98 
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TABLE 14.6 
Comparative Study between Proposed Model and Existing Models 


Source Algorithm/Technique Maximum Accuracy 
G. et al. (2018) Both the CNN and CNN-LSTM algorithms Best accuracy obtained by CNN- 
were used for the detection of diabetes with LSTM for 5-fold cross-validation 
the help of heart rate obtained by ECG was 95.1%. 
Mohebbi et al. [5] A novel adherence detection algorithm using Maximum accuracy obtained by CNN 
LR, MLP, and CNN is 77.5%. 
Kamble and Patil [6] ^ A method for diagnosing diabetes employing ^ Best accuracy, FI score, MCC 
a DNN and 5-fold and 10-fold cross obtained for 5-fold cross validation 
validation to train its properties were 98.35%, 98, and 97, respectively. 
Ashiquzzaman A fully connected DNN followed by dropout Maximum accuracy obtained by the 
et al. [25] layer proposed proposed model is 88.41%. 
Present model SMOTE-based deep neural network model Maximum accuracy and MCC are 


98.60% and 0.9288, respectively. 
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FIGURE 14.3 
Best deep learning algorithms for the prediction of diabetes. 


14.6 Conclusions 


Proper diagnostics and disease detection in a small span of time are one of the major chal- 
lenges in medical diagnosis systems. Diabetes detection is one of the major challenges in the 
medical domain, where delayed detection may cause the multiple organ failures including 
kidney failure, stroke, blindness, heart attacks, and lower limb amputation. Hence in mod- 
ern research, artificial intelligence techniques were used to predict the disease accurately in 
a very short span of time, which helps in successful treatments. Deep learning is a subset 
of machine learning in AI, which has the capability of self-learning from large amounts of 
unstructured and unlabeled data. 

As we all know, the early detection of diabetes is important as it causes other fatal dis- 
eases for human life. Many researchers have proposed different methodologies about 
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machine learning and deep learning algorithms for the better prediction of test datasets of 
diabetes. Among all approaches which use the same dataset (PIDD) for their model train- 
ing and testing, the deep neural network performed better. We employed 768 datasets 
with 17 attributes in this study, with 268 cases in the positive class (1) and 500 in the nega- 
tive class (0). Within the dataset, there is an unbalanced distribution of classes. One of the 
key causes of decreased classification model accuracy is unequal distribution. The goal of 
the synthetic minority over-sampling technique (SMOTE) is to interpolate data from 
minority class samples in order to increase their numbers. This aids in classification gen- 
eralization, and finally, a deep neural network is utilized to forecast the testing datasets. 
For the model validation we used the confusion matrix parameters of accuracy, recall, and 
F1 score and the statistical parameters of Mathew's correlation coefficient (MCC) and 
Kappa. In Table 14.5 we explain that the proposed SMOTE prediction is much better than 
the previous state-of-the-art techniques utilized for the prediction of diabetes. 

Future aspects of the present research can be upgraded in the following ways: deep 
learning predicts accurately for image datasets, hence in the future image datasets taken by 
ECG, facial images, and e-nose tests of diabetes patients can be used for determining the 
performance of the deep learning algorithm. To improve the prediction of diabetes, IoT- 
based data can be used to determine the performance of deep learning by means of reduc- 
tion of signal bandwidth and less computational time. 
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Human Activity Recognition (HAR), Prediction, 
and Analysis Using Machine Learning 


15.1 Introduction 


By monitoring someone's daily behaviors, humans can learn about that person's per- 
sonality and psychological state [1-3]. Following this pattern, researchers are actively 
researching human activity recognition (HAR), which aims to anticipate human behavior 
using technology. One of the crucial areas for research in computer vision and machine 
learning is now this. Although collecting motion data was challenging in the past, modern 
technical advancements make it easier for researchers to gather the data by using portable 
devices [4]. 

Machine learning (ML) is a subfield of artificial intelligence that focuses on using data 
to anticipate the future or assist in making decisions [5-7]. When attempting to forecast a 
categorical outcome, there is a categorization challenge [8, 9]. Both academics and practi- 
tioners in behavior analysis frequently base their conclusions in this field on data. These 
choices could involve figuring out whether an independent variable had an impact on a 
behavior, picking an evaluation, figuring out how conduct serves a purpose, or forecasting 
whether an intervention will result in significant behavior changes in a particular person. 
Depending upon subjectivity, the decision may vary from person to person; hence a poten- 
tial solution has been made by ML [10-12]. As a result, using subjectivity to make decisions 
may lead to variations between behavior analysts. Intensifying the application of machine 
learning to behavior analysis is one possible answer to this problem. ML is widely appli- 
cablein the field of behavior experimental analysis and translational research. Additionally, 
some algorithms might make it easier to identify variables linked to particular behaviors 
that can be challenging to isolate empirically. To test hypotheses that could be challenging 
to evaluate with real animals, they might be simulated by machine learning [13]. In super- 
vised ML, a model is trained using prior observations to forecast results on new samples 
using computerized instructions. Supervised ML algorithms have been investigated as 
helpful tools to enhance decision-making in a variety of sectors, including medicine, edu- 
cation, renewable energy, and health care, in recent years [14]. There are several research 
has been conducted by ML in different domains. Some of these are agriculture [15], breast 
cancer [16], diagnosing autism [17], detecting unsafe workplace behavior [18], renewable 
energy [10, 11], and so on. 

One potential motivation of this research is changing the thought process of ML such 
that it not only addresses training or clustering the dataset but also addresses a problem 
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involving decision making in behavior analysis. The overall research work is established 
as follows way. In Section 15.2 previous works are discussed; the proposed methodology 
for HAR is presented in Section 15.3; analysis of the machine learning algorithm, result 


analysis, and finally conclusion and future scope of the present research are explained in 
Section 15.4-15.d, respectively. 


15.2 Related Works 


There are several research works that have been done in the past; some of these are high- 
lighted here. A smart home sensor has been used to collect the data, and finally a long short 
term memory (LSTM) model is used to analyze the data for human activity recognition 
followed by data acquisition in ANN [19, 20]. A smartphone sensor based physical activ- 
ity monitoring device has been proposed [21-27] to monitor the physical activities such 
as walking, jogging, climbing, cycling, driving, and so on. To determine its accuracy, four 
different ML classifier are used: naive Bayes, decision trees, k-nearest neighbor (KNN), 
and support vector machine (SVM). They attain a true positive rate of more than 95% and 
a false positive rate of less than 1.576. A robust HAR system has been implemented with 
the help of coordinate transformation, PCA, and online SVM [28, 29]. To learn the impact 
of orientation fluctuations, coordinate transformation and principal component analysis 
(CI-PCA) has been utilized. Their proposed one class support vector machine (OSVM) is 
independent and only makes use of a little amount of information from the hidden loca- 
tion. Through smartphone accelerometer data, [30, 31] offer a HAR method with variable 
location and orientation. In the model, authors additionally incorporate both generic and 
site-specific SVM. Many researchers primarily use general machine learning techniques 
since deep learning (DL) approaches need a significant number of datasets [32]. 


15.3 Proposed Method for Human Action Recognition 
15.3.1 Data Collection Overview 


In this research, the experimental dataset has been acquired from 30 people in the age 
range of 19 to 48 years who participated in the trials. Six fundamental tasks were carried 
out by them: three static postures and three dynamic activities [31, 32]. Between the static 
postures in the trial, postural changes also took place: stand to sit, sit to stand, lie to sit, 
stand to lie, and lie to stand. We recorded 3-axial linear acceleration and 3-axial angular 
velocity at a constant rate of 50 Hz using the device's built-in accelerometer and gyroscope. 


15.3.2 Signal Processing 


After applying noise filters as a preprocessing step, the accelerometer and gyroscope sen- 
sor data were sampled with fixed-width sliding windows of 2.56 seconds. A Butterworth 
low-pass filter was used to separate the gravitational and body motion component attri- 
butes from the actual datasets [33]. 
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15.3.3 Feature Selection 


The features selected for this database come from the accelerometer and gyroscope 3-axial 
raw signals tAcc-XYZ and tGyro-XYZ. These time domain signals were captured at a 
constant rate of 50 Hz. To eliminate the noise of the body acceleration signal, a 3rd-order 
Butterworth low pass filter with a corner frequency of 20 Hz was used, and for gravity accel- 
eration signals Butterworth low pass filter with a corner frequency of 0.3 Hz was utilized. 


15.3.4 Exploratory Data Analysis 


The University of California Irvine (UCI) machine learning repository is where the dataset 
was taken from [34]. The data collection included 30 volunteers who were between the 
ages of 19 and 48. Each participant completed six tasks described in a previous subsection. 
A total of 561 feature vectors with time- and frequency-domain variables make up the 
dataset [35-37]. There are 10,299 records in total, split 70/30 between the training and test 
datasets. The six classes are then translated to numbers in the following order: [1-6]. 


15.3.5 Data Preprocessing 


Duplicates and the missing values were checked, and no duplicates or missing values were 
found [38-40]. Next, the data imbalance was checked, the output of which is depicted in 
Figure 15.2. Various human activities and their counts are included in the training dataset. 
Thus, we discovered that there are about the same number of observations across all six 
activities, proving that there is no class imbalance in the data. 


15.3.6 Exploratory Data Analysis for Static and Dynamic Activities 


Static activities: no moving activities are shown in Figure 15.3. 
From Figure 15.3, static and dynamic activities can be separated. 


if (tBodyAccMag-mean ()<=-0.5): 
Activity = “static” 
else: 


Activity = “dynamic” 


Figure 15.4 presents the graphical representation of static and dynamic activities. Figure 15.5 
presents a box plot of body acceleration magnitude mean across all the six categories. 


Exploratory Ensemble machine 
data learning for classify 
analysis the dataset 


Data Feature 


collection i selection 


Identification of best 


model 


FIGURE 15.1 
Proposed models for HAR in present research. 
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Bar plot for distribution of activities. 


if (tBodyAccMag-mean ()<=-0.8): 
Activity = “static” 

if (tBodyAccMag-mean () >=-0.6): 
Activity = “dynamic” 


Also, we can easily separate the WALKING_DOWNSTAIRS activity from others using a 
box plot. 


if (tBodyAccMag-mean ()>0.02): 
Activity = “WALKING_DOWNSTAIRS” 
else: 


Activity = “others” 
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FIGURE 15.3 
Density plot for statics and dynamic activities by body acceleration magnitude and mean feature. 
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FIGURE 15.4 
Graphical representation of static and dynamic activities. 


Boxplot of tBodyAccMag-mean() column across various activities 
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FIGURE 15.5 
Box plot for body acceleration—magnitude mean characteristics. 
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Box plot of angle (X,gravityMean) column across various activities 
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FIGURE 15.6 
Analyzing angle between x-axis and gravity mean feature. 


Angle(X,gravityMean) perfectly separates LAYING from other activities as shown in 
Figure 15.6. 


if (angle (X, gravityMean)>0.01): 
Activity - "LAYING" 

else: 
Activity - "others" 


Similarly, the angle between the y axes separated from gravity feature is shown in 
Figure 15.7. 
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Box plot of angle (Y,gravityMean) column across various activities 
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FIGURE 15.7 
Analyzing angle between y-axis and gravity mean feature. 


15.3.7 Visualizing Data Using t-SNE 


A very good visual separation can be achieved in two dimensions using the PCA method 
[41]. Hence an experiment with multivariate dimensionality reduction can be done effi- 
ciently. The display of high-dimensional datasets is especially well suited for the nonlinear 
dimensionality reduction method known as t-distributed stochastic neighbor embedding 
(t-SNE) [42]. It is widely used in the processing of voice, genetic data, natural language 
processing (NLP), and image data. It is possible to display t-SNE data from an extremely 
high dimensional space to a low dimensional environment while still maintaining a sig- 
nificant amount of real information [43, 44]. Through t-SNE, each of the six activities in a 
2D space, given that the training data comprises 561 distinct features, can viewed and dif- 
ferentiable (Figure 15.8). 
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FIGURE 15.8 
t-SNE activity visualization. 


15.4 Machine Learning Algorithm 
15.4.1 Logistics Regression 


As a benchmark model, logistic regression (LR) always optimizes the probability of the 
data. This model is simple and lacks the intricacy of neural networks. If the data is on 
the proper side of the separating hyper plane, LR performs better. To compare the perfor- 
mance, a kernel technique with "Ibfgs" optimizer can be used. With that mix of LR, some 
prior study has produced some promising outcomes [44-46]. Due to its variant this algo- 
rithm is suitable for the present research (Table 15.1). 


TABLE 15.1 
Best Parameters for LR Model [47] 


Parameters Values Description 


Regularizer L2 [L1, L2] Regularization 
C 2 [0.1, 0.5, 1, 2, 5, 10, 100] Penalty parameter C 
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15.4.2 Random Forest 


An amalgam of several decision trees makes up the random forest (RF) model. It inte- 
grates many teaching techniques and finally improves the overall result. Here, the criterion 
assessed is nothing but the level of a split. Gini is used to check for impurities, and entropy 
provides the information. The parameter is N estimators, and the maximum depth is used. 
The number of trees in the created forest is N estimators [48, 49], and the maximum depth 
of a tree is max depth (Table 15.2). 


15.4.3 Decision Tree 


A decision tree is a choice and possibility formation that resembles a tree [51]. By grouping 
instances from root to leaf, it distinguishes between them. Here, a Gini is used to determine 
impurity and entropy to gather data [52]. These gauge how well the split was made. The 
highest depth of the tree utilizes the range from 0 to 8 [53] (Table 15.3). 


15.4.4 Support Vector Machine 


One of the selected ML algorithms is the support vector machine (SVM) [54]. A separating 
hyper plane determines the discriminative classifier SVM. A new data point can be cat- 
egorized by the SVM optimum hyper plane [55]. Each class is located on either side of this 
hyper plane in two-dimensional space [56]. The support vector classifier (SVC) for more 


TABLE 15.2 
Best Parameters for RF Model [50] 


Parameters Values Description 

Criterion Gini [Gini, Entropy] Gini impurity and entropy is information gain based 
N Estimators 90 [10, 150] Trees number 

Max depth 6 [0,2, 4, 6, 8, 10] Highest depth of the tree 

TABLE 15.3 


Best Parameters for DT Model [52] 


Parameters Values Description 

Criterion Gini [Gini, Entropy] Gini impurity and Entropy is information gain based 
Max depth 4 [0, 2, 4, 6, 8, 10] Highest depth of the tree 

TABLE 15.4 


Best Parameters for SVM Model [56] 


Parameters Values Description 


Kernel Linear [Linear, RBF, Sigmoid] Kernel type model used 
C 1 [0.1, 0.5, 1, 2, 5, 10, 100] C is a penalty parameter 
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than two classes is one of the SVM variations. In this study, SVC with kernels from the 
“linear,” “rbf,” and “sigmoid” families is used. When there are many features, SVM often 
offers decent accuracy (Table 15.4). 


15.4.5 K Nearest Neighbor (KNN) 


KNN is one of the simplest nonparametric supervised ML algorithms [57, 58]. This algo- 
rithm is mainly used to identify in which specific category an unknown data points lies 
using the concept of Euclidean distance and specific value of K. The classification of a data- 
set can be identify based on the value of K ranging from 1 to 5, defaulter metric, Minkowski, 
and standard Euclidean metric (p) [59] (Table 15.5). 


15.4.6 Naive Bayes 


NB is a kind of probability-based ML technique which classifies the objects on certain fea- 
tures [60-62]. In NB, features are independent. One particular feature is not sufficient to 
discriminate one object from another. Hence it is called naive. There are three different 
types of naive classifier model used: multinomial, Bernoulli, and Gaussian (Table 15.6). 


15.4.7 Data Preprocessing 


There are train and test parts of the dataset. There are no redundant or missing values. 
The data frame contains both train and test data at first. To shrink the feature space, we 
need to delete some features. By computing for the dataset, we attempted to choose fea- 
tures. With the highest ANOVA F-values, the top 100, 200, 400, and 500 attributes were 
considered. The accuracy did, however, dramatically improve each time after adding the 
characteristics for all five algorithms, as seen in Figure 15.9. As a result, we test and train 
using all the functionalities. There are two sections to the learning. All of the algorithms’ 
ideal parameters are initially learned. The algorithms are then contrasted in the following 
phase. Here k = 5 is used to evaluate the cross-validation dataset. Stratified sampling is 
used to divide the data. Without modifying the test data, we cross-validate using 70% of 
the training data. 


TABLE 15.5 
Best Parameters for KNN Model 


Parameters Values Description 

N. neighbors 5 [1,5] Defines the required neighbors of the algorithm 
Metrics Minkowski Decides the distance between the points. 

P 2 Standard Euclidean metric 

TABLE 15.6 


Best Parameters for Naive Bayes Model 


Parameters Values Description 


var smoothing default = 1e-9 Variances for calculation stability 
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FIGURE 15.9 
Accuracy of the seven algorithms by increasing features. 


15.5 Experimental Results 


To identify the ideal hyper parameters, here a grid search and 5-fold stratified cross- 
validation is used. We run the best parameter models on the test data; after that, a statistical 
significance test is measured. In evaluation metrics, accuracy is calculated by comparing 
total number of forecasts to the number of right predictions [63-65]. This parameter is 
essential because it describes correctly whether a person is sitting or walking. The main 
metric used to differentiate between these models is accuracy. Using true positive (TP), 
true negative (TN), false positive (FP), and false negative (FN) in Equation (15.1), we obtain 
an accuracy score. 
TP - TN 


Accuracy = (15.1) 
TP - TN - FP EN 


Additionally, we use precision, recall, and F1 score as the evaluation metrics. The equa- 
tion of these three metrics is given in Equations (15.2)-(15.4). 


TP 


Precision = (15.2) 
TP+ FP 
Recalls -E — os 
TP - EN 
FI 2* Luat anti * Recall (15.4) 
Precision + Recall 


Figures 15.10 and 15.11 show the confusion matrix and classification report for logistics 
regression; Figures 15.12 and 15.13 show random forest; Figures 15.14 and 15.15 show deci- 
sion trees; Figures 15.16 and 15.17 show linear SVM; Figures 15.18 and 15.19 show kernel 
SVM; Figures 15.20 and 15.21 show KNN; and Figures 15.22 and 15.23 show naive Bayes. 
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Confusion matrix for LR. 
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FIGURE 15.11 
Classification report for LR. 
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Confusion matrix for RF. 
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FIGURE 15.13 
Classification report for RE. 
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Confusion matrix for DT. 
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FIGURE 15.15 
Classification report for DT. 
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Confusion matrix for linear SVM. 
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FIGURE 15.17 
Classification report for linear SVM. 
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Confusion matrix for kernel SVM. 
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FIGURE 15.19 
Classification report for kernel SVM. 
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Confusion matrix for KNN. 
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FIGURE 15.21 
Classification report for KNN. 
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Confusion matrix for naive Bayes. 
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FIGURE 15.23 
Classification report for naive Bayes. 
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15.6 Conclusion 


In conclusion, it is evident that linear SVM performs better than the other six algorithms, 
though RF, kernel VM, and LR are closer in case of accuracy. Naive Bayes cannot compete 
with linear and kernel SVM for this dataset. To explore the performance of human activ- 
ity recognition, a hybrid machine learning algorithm can be used as a future scope of the 
research. In addition, a significant amount of hyper parameter tuning is required before 
other machine learning techniques are applied. 
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